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ADVANCED  CONCEPTS  OF  APPROXIMATE  REASONING 

FINAL  TECHNICAL  REPORT 
Executive  Summary 

Enrique  H.  Ruspini 
Artificial  Intelligence  Center 
SRI  International 


1  Introduction 

This  final  report  consists  primarily  of  a  collection  of  papers  that  have  been  published,  pre¬ 
sented,  or  await  publication  in  various  forums  presenting  the  results  of  research,  sponsored  by 
the  U.S.  Army  Research  Office,  on  an  artificial  intelligence  discipline  known  as  “Approximate 
Reasoning.” 

This  collection  includes  both  detailed  technical  presentations  of  approximate  reasoning 
issues  [1,6],  various  rummaries  of  those  presentations  [4.5,8],  and  an  encompassing  overview 
of  their  significance  [2]  in  the  context  of  a  unified  formal  framework,  developed  as  part  of  the 
reported  research. 

For  this  reason,  we  have  chosen  a  format  based  on  inclusion  of  ail  papers  relevant  to  our 
research,  preceded  by  this  executive  summary,  which  is  also  intended  to  guide  the  interested 
reader  to  the  diverse  works  that  make  the  bulk  of  the  report. 

The  research  program  on  advanced  concepts  of  approximate  reasoning  had  the  goal  of 
establishing  firm  formal  foundations  that  explain  the  different  technologies  proposed  to  solu* 
the  problems  associated  with  the  processing  of  imprecise  and  uncertain  information,  permit 
a  comparison  of  their  advantages  and  disadvantages,  and.  specially,  allow  the  determination 
of  their  applicability  to  specific  problems. 

The  research  results  reported  herein  clarify  fundamental  aspects  of  information  process¬ 
ing  under  conditions  of  imprecision  and  uncertainty.  These  results  represent  particularly 
important  steps  toward  the  development  of  systems  for  analogical  reasoning,  i.e.,  automated 
devices  that  exploit  similarities  between  scenarios  to  “extrapolate”  from  known  examples 
into  unknown  situations. 

Because  of  their  fundamental  nature,  these  results  are  applicable  to  a  wide  variety  of 
pioblems  of  Army  interest  including  intelligence  analysis,  autonomous  device  planning  and 
control,  vulnerability  analysis,  human  factors  engineering,  material  analysis,  fault  diagnosis, 
reliability  analysis,  system  design,  and  mission  planning  and  counlerplanning. 

On  the  basis  of  the  nature  of  the  results  obtained  during  this  research  and  current  prac¬ 
tical  experience  with  the  applicability  of  various  approximate  reasoning  techniques,  it  is 


possible  to  identify  the  following  applications  as  being  particularly  amenable  to  treatment 
in  the  near  future: 

1.  Control  of  unstable  systems,  such  as  helicopters,  land  vehicles,  or  weapon  platforms, 
by  means  of  possibilistic  control  techniques 

2.  Control  of  navigation,  target  tracking,  and  obstacle  avoidance  by  autonomous  mobile 
agents 

3.  Elimination  of  involuntary  plalform/band  movement  in  object-tracking  tasks. 

4.  Development  of  vulnerability  measures  and  related  assessments  of  structural  viabilit  y. 

5.  Development  of  approximate  models  of  complex  systems. 

6.  Coordination  of  real  time  intelligent  agents  on  the  basis  of  considerations  about  their 
usefulness,  associated  risks,  and  probability  of  success. 

2  Approximate  Reasoning 

Approximate  Reasoning  is  t lie  collective  name  given  to  a  variety  of  automated  methods  and 
techniques  for  the  analysis  of  imprecise  and  uncertain  information. 

The  first  task  in  our  investigation  was  to  clarify  the  nature  of  the  approximate  reasoning 
problem:  a  poorly  understood  question  that  was  felt  to  be  the  basic  cause  of  the  controversy 
that  characterized  the  state  of  the  art.  Prior  characterizations  of  approximate  reasoning 
technology  broadly  interpreted  the  epithet  ‘approximate”  as  an  indication  of  either  the  poor 
quality  of  the  underlying  knowledge  or  that  of  the  proposed  techniques,  considered  to  be 
heuristical  imitations  of  the  sounder  methods  of  classical  logic. 

Our  approach  to  the  characterization  of  the  approximate-reasoning  problem  was  based 
on  continuation  of  previous  work  of  the  principal  investigator  (“The  Logical  Foundations  of 
Evidential  Reasoning.”  SRI  AH'  Technical  Note  No.  40S.  1987).  which  relied  on  the  logical 
notion  of  “possible  world."  The  result  of  these  investigations  was  the  development  of  a 
unified  framework  for  the  approximate  reasoning  problem  that  is  briefly  summarized  in  a 
paper  presented  at  the  Fourth  International  Symposium  on  Knowledge  and  its  Engineering  [7] 
and  that  is  considerably  expanded  in  a  related  assessment  of  the  stale  of  the  art  and  its 
progress  [2]. 

Informally  speaking,  possible  worlds  are  the  conceivable  situut.ons.  scenarios,  states,  or 
behaviors  of  a  real-world  system,  i.e..  the  conceivable  solutions  of  a  typical  situation-  or 
state-assessment  problem.  In  those  problems,  we  are  typically  required  to  state  whether  the 
system  in  question  (e.g..  “the  weather  at  Menlo  Park”)  is  (or  was.  or  will  be)  in  such  a 
state  that  certain  statements  (called  hypotheses)  about  it  are  true  (e.g.,  “...  will  be  rainy  on 
November  15”). 

To  answer  such  questions  in  the  context  of  a  typical  reasoning  problem,  we  usually  make 
various  observations  of  our  system  (e.g..  temperatures,  pressures)  that,  when  combined  with 
existing  background  knowledge  (e.g..  meteorology),  eliminate  certain  conceivable  possibilit ies 
from  consideration.  The  remaining  slates,  called  in  our  model  the  evidential  »t1  because  of 

■> 


its  obvious  relationship  with  observed  evidence,  are  then  examined  to  determine  whether  all 
its  possible  states  are  such  that  a  hypothesis  of  interest  is  true  in  all  of  them  or  it  is  false  in 
all  of  them. 

if  that  is  indeed  the  case,  as  illustrated  in  Figure  1.  then  the  problem  is  a  conventional 
reasoning  problem  capable  of  being,  at  least  conceptually,  solved  by  classical  logical  tech¬ 
niques  (i.e.,  the  evidence  implies  the  hypolhi sis.). 

In  an  approximate  reasoning  problem,  however,  the  situation  resembles  that  illustrated 
in  Figure  2,  where,  in  some  of  the  possibilities  that  are  consistent  with  the  evidence  are  such 
that  the  hypothesis  is  true,  while  on  others  it  is  false.  Being  faced  with  such  an  inability  to 
solve  the  problem  of  finding  whether  a  hypothesis  is  true  or  false,  all  approximate  reasoning 
methods,  in  one  way  or  another,  modify  the  problem  to  be  solved  concentrating  instead  in 
describing  the  evidential  set  in  terms  of  its  relationship  with  the  hypothesis  of  interest. 

Probabilistic  reasoning  methods,  illustrated  in  Figure  3.  for  example,  seek  to  determine 
the  proportion  of  evidential  possibilities  where  a  hypothesis  is  true  (i.e..  the  conditional 
probability  of  truth).  This  proportion  is  usually  estimated  with  the  aid  of  statistical  tables 
that  summarize  experience  under  similar  circumstances. 

Possibilistic  reasoning  methods,  on  the  other  hand,  rely  on  measures  of  resemblance  and 
similarity  to  determine,  as  illustrated  in  Figure  4.  to  what  extent  evidential  possibilities 
resemble,  or  are  close  to.  the  set  of  possibilities  where  the  hypothesis  is  true.  The  similarity 
measure  that  makes  such  a  characterization  possible  is  intended  to  be  a  measure  of  the 
extent  by  which  facts  that  are  true  in  one  situation  or  scenario  are  true  in  another.  For 
example,  assessments  of  the  stability  of  a  weapon  platform  under  some  assumptions  will 
remain  approximately  valid  for  similar  platforms. 


Figure  1:  The  conventional  reasoning  problem. 


3  Possibilistic  Reasoning 

Having  in  the  past  successfully  utilized  possible- world  models  to  describe  the  conceptual 
bases  of  probabilistic  reasoning  and  its  generalizations,  notably  the  Dempster- Shafer  calculus 
of  evidence,  our  attention  during  the  reported  research  was  primarily  focused  upon  the  formal 
characterization  of  possibilistic  (i.e.,  “fuzzy  logic”)  methods  according  to  the  similar)  tv- based 
model  that  is  briefly  described  above. 

The  major  result  of  this  research  was  a  semantic  model  that  was  summarized  in  a  number 
of  publications  and  presentations [1. 7.^.9. 1 U]  and  that  is  discussed  in  detail  in  a  technical 
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note[l],  soon  to  be  published  in  tin*  Inte motional  Journal  of  Approximate  Reasoning. 

The  major  characteristics  of  this  model  are: 

•  its  ability  to  describe  possibilistic  techniques  as  the  result  of  imposing  metric  structures 
upon  a  set  of  possible  worlds  rather  than  as  the  consequence  of  defining  certain  set 
measures  (i.e..  probabilities)  on  that  set. 

•  its  characterization  of  the  metric  properties  of  similarity  or  resemblance  functions  using 
operators  previously  considered  only  in  the  context  of  multivalued  logics  and  the  theory 
of  probabilistic  metric  spaces  (i.e..  triangular  notvis), 

•  its  description  of  metric  relations  between  pairs  of  possible  states  or  scenarios  using 
well-known  topological  concepts  (i.e..  the  Hausdorff  distance),  and  the  identification  of 
relationships  between  such  notions  and  the  notions  of  unconditioned  and  conditional 
possibility  distributions. 

•  the  validation  of  the  generalized  inodm *  pontns  — the  major  inferential  procedure  of 
fuzzy  logic — as  a  generalization  of  its  classical  counterpart, 

•  its  ability  to  provide  cogent  descriptions  of  approximate  relations  between  system  vari¬ 
ables. 

Ongoing  research,  to  be  reported  in  tin*  immediate  future,  is  currently  concerned  with 
the  following  issues: 

•  Derivation  of  similarity  measures  hum  possibility  measures.  The  semantic  model  de¬ 
scribed  above  has  clearly  established  that  possibilistic  logic  procedures  rely  on  notions 
of  similarity  between  plausible  states  of  the  world  rather  than  on  measures  of  the 
relative  likelihood  of  such  possibilities.  While  this  model  was  developed  primarily  to 
improve  understanding  of  fundamental  conceptual  matters,  the  relations  that  were  un¬ 
covered  during  such  development  have  .significant  implications  of  a  practical  nature.  Of 
particular  importance  is  the  potential  ability  to  derive  similarity  measures  -  -the  bases 
for  such  analogical  processes  as  case-based  reasoning— from  possibility  distributions 
the  formal  expression  of  important  qualitat  ive  physical  laws.  We  have  developed  initial 
formulations  for  the  derivation  of  such  similarity  measures  on  the  basis  of  a  formal  re¬ 
sult  of  L.  Valverde  on  the  representation  of  similarity  measures. 

•  The  role  of  the  notion  of  negation  in  possibilistic  logic.  Conventional  modal  logics  are 
concerned  with  the  qualification  of  the  truth  of  propositions  by  describing  such  truth 
as  being  either  necessary  (i.e..  the  unavoidable  consequence  of  basic  assumptions  and 
the  rules  of  logic),  or  contingent  (i.e..  the  consequence  of  assumptions  applicable  to  the 
particular  situation  undei  consideration).  These  considerations  are  the  bases  for  the 
concepts  of  possibility  and  necessity,  which  related  by  a  straightforward  duality  relation 
(based  on  the  notion  of  negation)  stating  that  something  is  possible  if  its  negation  is 
not  necessary. 
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Our  mode!  of  the  semantics  of  fuzzy  logic,  while  introducing  graded  (i.e..  relative) 
notions  of  possibility  and  necessity  bast'd  on  measures  of  similarity,  did  not  relate  such 
notions  using  the  concept  of  negation.  Identification  of  such  a  relationship,  however, 
is  of  significant  conceptual  and  practical  importance  as  knowledge  of  what  is  possible 
under  certain  circumstances  may  yield  important  information  as  to  what  is  necessary 
in  other  cases. 

Study  of  duality  relations  between  pairs  of  subsets  of  possible  worlds  have  led  to  the 
definition  of  new  concepts  of  negation  that  are  closely  associated  with  the  relations 
that  exist  between  linguistic  qualifiers  that  are  intonvms  of  each  other  (e.g.,  [rich, 
poor]  rather  than  [rich,  not -rich]). 

•  The  study  of  the  roles  of  system  variables  and  concepts  of  independence  in  possibilis- 
tic  logic.  We  have  studied  the  relations  between  similarity  functions  defined  from  the 
joint  viewpoint  of  several  variables  (e.g..  as  when  objects  are  differentiated  using  mul¬ 
tiple  attributes  such  as  color,  volume,  shape)  and  marginal  similarities  that  take  only 
into  account  certain  subsets  of  variables  (e.g.,  measures  of  resemblance  based  solely 
on  color).  We  have  derived  initial  formulations  for  the  derivation  of  joint  similarity 
measures  from  their  marginals  and  viceversa. 

Furthermore,  study  of  the  relationships  that  hold  between  similarity  measures  defined 
from  diverse  viewpoints  have  led  to  the  definition  of  possibilistic  measures  ~f  inde¬ 
pendence  (or  interaction)  between  variables.  The  results,  which  will  be  reporu .  in  a 
technical  note  that  is  currently  under  preparation,  are  of  major  practical  importance 
to  simplify  complex  processes  of  possibilistic  inference  (i.e..  providing  a  possibilistic 
counterpart  to  the  probabilistic  methods  of  network  decomposition).  We  are  currently 
investigating  representation  formulas  to  derive  marginal  similarity  functions  without 
having  to  resort  to  transitive  extension  (i.e..  chaining)  of  certain  nontransitive  rela¬ 
tions.  Availability  of  such  formulas  will  greatly  improve  the  efficiency  of  inferential 
processes. 

•  We  are  also  investigating  the  conceptual  reJali  .is  between  the  important  decision- 
theoretic  notions  of  utility.  rust.  d(  suability.  and  pnfaxnct.  The  central  idea,  based 
on  concepts  proposed  by  Reseller  (N.  Reseller.  "Semantic  foundations  for  the  Logic 
of  Preference.1'  in  N.  Reseller,  editor.  Tht  Logic  of  Duision  and  Action ,  Pittsburgh. 
1967),  is  that  such  notions  may  be  logically  formalized  by  measures  that  quantify 
our  preference  to  be  in  certain  states  of  the  world  rather  than  others.  Preliminan 
results  indicate  that  a  utility- based  model  will  provide  an  even  broader  formal  basis 
for  possibilistic  logic,  while  relating  such  preference  measures  with  the  metric  structures 
of  our  basic  model. 

•  We  have  developed  a  possibilistic  formulation  for  the  control  of  the  navigation  and  for 
obstacle  avoidance  by  autonomous  vehicles  that  is  being  currently  tested  in  the  context 
provided  by  the  SRI  Autonomous  Mobile  Agent  Platform. 


4  Probabilistic  Reasoning 

We  have  also  continued  to  investigate  various  issues  of  probabilistic  reasoning,  focusing  upon 
questions  of  validity  and  generality  of  the  Denipster-Shafer  calculus  of  evidence. 

We  have  given  special  attention  to  the  discussion  of  recent  concerns,  raised  within  the 
technical  community,  about  the  conceptual  soundness  of  this  approach.  Our  contribution 
to  this  exchange,  intended  primarily  to  clarify  various  confusions  and  misconceptions,  was 
summarized  in  a  paper  presented  at  the  Third  International  Conference  on  the  Management 
of  Imprecision  and  Uncertainty  by  Expert  Systems  [5],  which  is  expanded  upon  in  an  un¬ 
published  manuscript  [6].  currently  under  submission  that  is  enclosed  as  part  of  this  final 
report. 

We  have  also  continued  our  previous  research  on  generalized  probabilistic  methods  em¬ 
phasizing  the  study  of  issues  related  to  the  treatment  of  conditional  and  dependent  evidence. 
We  have  determined  that  .  for  reasonable  definitions  of  conditional  evidence  distributions  in 
the  context  of  the  DS  calculus  of  evidence,  these  distributions  are  such  that  their  combina¬ 
tion  with  unconditioned  evidence  usually  results  (even  for  simple  examples)  in  probability 
bounds  that  cannot  be  expressed  within  the  Dempster-Shafer  framework. 

In  connection,  with  these  investigations  we  have  derived  a  preliminary  formulation  of  the 
problem  of  combination  of  conditional  and  unconditioned  distributions  as  a  linear  program. 
In  general,  however,  the  solutions  of  such  a  problem  will  not  obey  the  axioms  of  the  calculus 
of  evidence.  Currently,  we  are  focusing  otn  attention  upon  three  major  questions: 

•  the  determination  of  cases  wlieie  the  lesult  of  evidential  conditioning  is  a  belief  func¬ 
tion. 

•  the  approximation  of  results  not  >atMying  evidential  axioms  by  belief  functions  that 
do. 

•  the  development  of  a  more  general  evidential  calculus  based  on  the  notion  of  lower  and 
upper  probabilities. 
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Abstract 


This  note  presents  a  personal  view  of  the  state  of  the  art  in  the  representation 
and  manipulation  of  imprecise  and  uncertain  information  by  automated  processing 
systems.  To  contrast  their  objectives  and  characteristics  with  the  sound  deductive 
procedures  of  classical  logic,  methodologies  developed  for  that  purpose  are  usually 
described  as  relying  on  Approximate  Reasoning. 

Using  a  unified  descriptive  framework,  we  will  argue  that,  far  from  being  mere 
approximations  of  logically  correct  procedures,  approximate  reasoning  methods  are 
also  sound  techniques  that  describe  the  properties  of  a  set  of  conceivable  states  of  a 
real-world  system.  This  framework,  which  is  based  on  the  logical  notion  of  possible 
worlds,  permits  the  description  of  the  various  approximate  reasoning  methods  and 
techniques  and  simplifies  their  comparison.  More  importantly,  our  descriptive  model 
facilitates  the  understanding  of  the  fundamental  conceptual  characteristics  of  the 
major  methodologies. 

We  examine  first  the  development  of  approximate  reasoning  methods  from  early 
advances  to  the  present  state  of  the  art,  commenting  also  on  the  technical  motivation 
for  the  introduction  of  certain  controversial  approaches. 

Our  unifying  semantic  model  is  then  introduced  to  explain  the  formal  concepts  and 
structures  of  the  major  approximate  reasoning  methodologies:  classical  probability 
calculus,  the  Dempster-Shafer  calculus  of  evidence,  and  fuzzy  (possibilistic)  logic. 
In  particular,  we  discuss  the  basic  conceptual  differences  between  probabilistic  and 
possibilistic  approaches. 

Finally,  we  take  a  critical  look  at  the  controversy  about  the  need  and  utility  for 
diverse  methodologies,  and  assess  requirements  for  future  research  and  development. 
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1  Introduction 


This  note  presents  a  personal  view  of  the  state  of  the  art  in  approximate  reasoning, 
the  name  used  to  describe  several  methodologies  for  the  development  of  intelligent 
systems  capable  of  manipulating  imprecise  and  uncertain  information. 

Approximate  reasoning  techniques  loosely  based  on  the  calculus  of  probability 
appeared  almost  simultaneously  with  the  development  of  expert  systems  relying  on 
classical  (i.e.,  two-valued)  logic  techniques.  Soon  after  these  systems  were  introduced, 
other  approaches  to  the  treatment  of  uncertainty  and  imprecision  were  also  proposed, 
both  to  generalize  more  or  less  conventional  probabilistic  schemes  and  to  capture  other 
aspects  of  imperfect  knowledge,  claimed  to  have  a  nonprobabilistic  nature. 

The  short  technological  history  of  approximate  reasoning  methods  may  be  de¬ 
scribed  as  being,  from  that  moment,  one  of  extreme  controversy  that  has  lasted  to 
this  day.  Most  of  the  proponents  of  classical  probabilistic  treatments,  often  described, 
although  vaguely  and  somewhat  misleadingly,  as  Bayesians,1  have  doubted  the  ne¬ 
cessity  for  the  introduction  of  other  conceptual  structures  and  have  often  sought  to 
explain  those  frameworks  in  terms  of  probabilistic  notions.  Proponents  of  alternative 
approaches,  on  the  other  hand,  have  defended  their  techniques  on  the  strength  of 
two  main  arguments:  the  practical  problems  associated  with  the  parameter-intensive 
procedures  of  conventional  probability,  often  demanding  knowledge  of  a  large  number 
of  probability  values;  and.  the  nonprobabilistic  nature  of  the  uncertainties  associated 
with  the  use  of  vague  concepts. 

Much  of  this  disagreement  has  been  clearly  caused  by  misunderstandings  about 
the  fundamental  philosophical  characteristics  of  each  approach.  Lacking  a  suitable 
basis  to  interpret  certain  concepts,  particularly  those  related  to  the  "degrees  of  truth” 
of  multivalued  logics,  it  has  been  impossible,  until  recently,  to  provide  an  adequate 
framework  to  discuss  fundamental  issues  in  a  rational  manner. 

This  position  paper  on  the  past  evolution  of  the  field,  its  present  state  of  the  art, 
and  desiderata  for  future  evolution  is  the  result  of  recent  research  by  the  author  in 
basic  semantic  issues  that  are  germane  to  the  foundation,  of  approximate  reasoning. 
The  presentation  is  based  on  the  use  of  a  central  unifying  framework:  a  formal  model 
of  the  approximate  reasoning  problem  that  explains  the  similarities  and  differences 
between  major  methodologies.  Using  this  "possible-worlds”  model,  we  will  also  be 
able  to  compare  the  rationale  of  nonmonotonic  logic  approaches  with  that  of  approx- 

lThe  qualifier  Bayesian  is  used  in  the  context  of  statistics  to  describe  proponents  of  a  statistical 
methodology  and  in  the  context  of  the  philosophy  of  probability  to  denote  various  subjective  views 
of  probability.  In  Artificial  Intelligence,  the  term  has  been  loosely  applied  both  to  those  investigating 
approaches  based  on  the  probability  calculus  and,  more  narrowly,  to  those  espousing  the  decision- 
theoretic  methods  of  subjective  probability. 
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imate  reasoning  procedures.  Although  our  model  is  a  rigorous  formalism,  described 
in  detail  elsewhere  [32,33]  in  connection  with  the  logical  foundations  of  the  Dempster- 
Shafer  calculus  of  evidence  and  fuzzy  logic,  our  discussion  will  be  kept  as  informal  as 
possible  to  facilitate  understanding  our  philosophical  and  technical  position. 

We  will  contend  that  regarding  probabilistic  and  possibilistic  approaches  as  com¬ 
peting  alternatives  is  incorrect  and  confuses  the  need  to  describe  different  aspects  of 
reality  with  the  adequacy  or  ability  of  probability  as  a  measure  of  likelihood.  We  will 
also  take  a  critical  look  at  the  major  claims  supporting  a  narrow  view  of  probability, 
based  on  a  subjectivist  interpretation  that  regards  all  forms  of  rational  decision¬ 
making  as  necessarily  demanding  optimization  of  expected-utility  functionals,  and 
we  dispute  claims  that  only  such  approaches  are  endowed  with  either  a  suitable  or  a 
proven  decision-theoretical  apparatus. 

On  the  basis  of  our  theoretical  arguments,  and  of  recent  success  in  the  appli¬ 
cation  of  various  techniques  to  practical  problems,  we  will  also  argue  that  future 
accomplishment  in  the  field  lies  in  the  rational  development  of  tools  leading  to  mul¬ 
tiple  complementary  views  of  the  implications  of  evidence  rather  than  on  arbitrary 
circumscription  to  a  limited  class  of  techniques  and  procedures. 

2  The  Development  of  Approximate  Reasoning 

Intelligent  systems  relying  on  approximate  reasoning  techniques  [8,39]  apoeared  in  the 
1970s,  approximately  at  the  same  time  as  other  systems  seeking  to  emulate  the  exper¬ 
tise  of  specialists  in  diverse  fields  of  endeavor.  Problems  related  to  the  development 
of  the  expert  systems  based  on  classical  deductive  procedures,  however,  were  primar¬ 
ily  related  to  the  need  to  organize  knowledge  and  its  processing  in  such  a  manner 
as  to  assure  an  efficient  derivation  of  the  truth  value  of  hypotheses  (i.e.,  either  true 
or  false).  Systems  such  as  MYC-IN  or  PROSPECTOR —  reasoning  about  medical 
and  geological  systems,  where  knowledge  is  limited  and  where  observations  may  be 
difficult  or  impossible  to  make — were  forced  to  deal,  in  addition,  with  issues  that,  to 
this  day,  have  almost  completely  consumed  the  attention  of  approximate  reasoning 
researchers. 

These  issues  may  be  generally  described  as  related  to  the  extension  of  the  basic 
derivation  rule  of  classical  logic,  the  modus  ponens,  which  states  that  from  the  va¬ 
lidity  of  an  antecedent  proposition  p  and  that  of  the  implication  p  — *  q,  it  is  possible 
to  derive  the  validity  of  the  consequent  proposition  q.  Although  a  conventional  ex¬ 
pert  system,  using  classical  rules  of  derivation,  could  be  assumed  t,o  have  sufficient 
information  to  derive  the  validity  of  a  hypothesis  of  interest,  whenever  knowledge 
was  scarce  or  uncertain  it  was  necessary  to  resort  to  other  schemes  that  qualified 
in  one  way  or  another  the  meaning  of  the  truth  of  propositions.  Still  imitating  the 
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network-oriented  techniques  of  truth-value  propagation  of  two-valued  logic,  the  ap- 
proximate  reasoning  schemes  developed  in  early  systems  sought  to  propagate  numeric 
truth  values  that  were  loosely  related  to  probabilistic  interpretations  of  uncertainty. 

The  concept  of  probability  provides  a  most  important  tool  to  describe  the  state  of 
systems  that  are  known  under  less  than  desirable  informational  circumstances.  Aris¬ 
ing  clearly  from  the  need  to  make  decisions  despite  undesirable  knowledge  handicaps, 
the  notion  of  probability,  seriously  studied  from  the  seventeenth  century,  has  always 
played  a  major  role  in  human  judgment  [16]. 

The  appeal  of  probability  as  an  instrument  to  assess  system  behavior  is  due  to  the 
empirically  observed  property  that  is  expressed  by  the  long-run  stability  of  occurrence 
of  certain  events.  Whether  such  a  pattern  of  occurrence  has  been  objectively  quanti¬ 
fied  through  experimentation  or  historical  observation  (objective  interpretation),  or 
is  subjectively  expressed  by  the  willingness  to  gamble  with  certain  stakes  (subjec¬ 
tive  interpretation),  it  is  clear  that  it  provides  a  rational  basis  to  formulate  rational 
expectations  about  system  state.  Why  would  anybody,  if  such  predictable  stability 
of  occurrence  could  not  be  assured,  be  willing  to  consciously  bet  on  „ome  outcomes 
rather  than  others  if  the  real  world  defies  any  attempts  to  descriptive  charac  na¬ 
tion? 

Curiously  enough,  although  probabilistic  interpretations  were  always  implicitly  or 
explicitly  intended  by  the  developers  of  early  approximate  reasoning  systems,  and 
while  the  underlying  calculi  reflect  such  explanations,  it  seems  also  clear  that  the 
machinery  of  these  devices  was  primarily  oriented  toward  the  emulation  of  the  propa¬ 
gation  schemes  of  classical  logic  with  truth  flowing  from  node  to  node  through  edges 
corresponding  to  implication  rules.  Approximate  truth,  measured  by  numbers  asso¬ 
ciated  with  objective  likelihood  or  expert  confidence,  also  flowed  from  evidence  to 
hypothesis  in  a  scheme  that  generalized  the  true-false  dichotomy  of  multivalued  logic. 

Regardless  of  the  clearly  intended  probabilistic  interpretations  of  those  numbers, 
misgivings  about  their  meaning  and  utility  were  sufficient  to  plant  the  seeds  of  the 
ensuing  controversy.  Concerns  about  the  inability  of  probability  to  capture  notions  of 
evidential  confirmation  led  the  developers  of  MYCIN[39],  for  example,  to  introduce 
modified  concepts  (“certainty  factors”)  as  an  alternative  to  direct  use  of  conditional 
probabilities.  In  spite  of  subsequent  studies  showing  that  such  certainty  factors  were 
related  to  probability  values  [18],  it  is  clear  that  these  worries  were  well  founded, 
having  been  already  eloquently  expressed  in  the  works  of  philosophers  of  science  [34]. 

Although  such  concerns  are  indeed  important  and,  despite  some  claims  to  the 
contrary,  must,  still  be  properly  addressed,  other  issues  soon  captured  the  attention 
of  those  seeking  to  develop  expert  systems  with  approximate  reasoning  capabilities. 
Beyond  certain  troublesome  issues  that  were  apparent  when  formulating  the  proba¬ 
bilistic  calculi  used  by  PROSPECTOR,  arising  from  inconsistencies  between  “expert 
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estimates”  of  probability  values  and  the  laws  of  probability,  it  was  also  clear  to  those 
engaged  in  the  development  of  new  expert  systems  that  a  typical  application  required 
estimation  of  a  very  large  number  of  individual  probability  values  [14],  which  were 
neither  available  or  derivable  from  existing  data. 

In  addition,  other  researchers,  acquainted  with  the  concepts  and  methods  of  mul¬ 
tivalued  logic  [31,13],  advanced  the  notion  that  some  of  the  “degrees  of  truth”  being 
propagated  could  be  interpreted  in  a  nonprobabilistic  fashion.  The  theory  of  fuzzy 
sets,  introduced  by  Zadeh  in  1965  [45],  had  been  for  some  time  the  focus  of  attention 
of  vthese  researchers  and  soon  became  a  major  source  of  techniques  for  the  treatment 
of  uncertainty  by  use  of  nonprobabilistic  schemes. 

The  variety  of  approximate  reasoning  methods  arising  from  this  diversity — expressed 
as  a  preference  toward  either  a  variedly  interpreted,  more  or  less  strict  application 
of  classical  probability  schemes;  as  approaches  seeking  the  expression  of  ignorance 
about  probability  values,  such  as  the  Dempster- Shafer  calculus  of  evidence;  and  as 
nonprobabilistic  schemes  like  fuzzy  logic—  have  led  to  a  controversy  that  has  endured 
to  this  day. 

It  has  not  been  possible,  until  recently,  to  discuss  these  approaches  with  the  help 
of  a  unifying  framework  that  facilitates  the  interpretation  of  relevant  concepts  and  the 
comparison  of  alternative  methodologies.  This  unifying  framework  is  based  on  a  view 
of  approximate  reasoning  problems  as  those  wherein  the  truth-value  of  a  hypothesis 
cannot  be  deduced  from  available  information.2  In  other  words,  several  scenarios,  all 
consistent  with  evidence,  may  be  conceived.  In  some  of  those  stuations  the  hypothesis 
is  true,  while  in  others  it  is  false. 

The  logical  notion  that  we  will  use  to  characterize  such  conceivable  states  of  affairs, 
situations,  or  scenarios,  is  the  concept  of  “  possible  world”  utilized  by  Carnap  [4] 
in  his  logical  treatment  of  the  concept  of  probability,  which  was  also  employed  by 
Nilsson  [26]  to  derive  a  logic-based  methodology  for  probabilistic  reasoning. 

3  Possible-World  Models 

A  possible  world  may  be  briefly  described  as  a  function  that  assigns  one  and  only  one 
of  the  truth  values  true  or  false  to  every  proposition  (i.e.,  declarative  statement) 
about  the  system  that  is  being  reasoned  about.  If  we  seek  to  describe  and  stm  v 
the  weather  in  Menlo  Park,  for  example,  the  atmospheric  conditions  at  several  points 
in  time  are  described  by  assigning  specific  values  to  meteorological  variables  such  as 
temperature,  humidity,  and  rainfall,  or,  equivalently,  by  assigning  a  truth  value  to 

2Sometimes  this  characterization  is  extended  to  include  those  cases  where  that  derivation  is  very 
difficult. 
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propositions  such  as 


The  temperature  at  3PM  was  75°F. 

Since  the  value  of  system  variables  is  unique  (e.g.,  the  temperature  cannot  be  both 
75°F  and  85°F  at  the  same  time),  it  is  clear  that  each  possible  world  (i.e.,  an  assign¬ 
ment  of  truth  values)  must  satisfy  certain  consistency  conditions  that  follow  from  the 
axioms  of  classical  logic. 

In  approximate  reasoning  problems,  however,  we  can  usually  do  more  to  restrict 
the  extent  of  the  set  of  possible  worlds  that  may  conceivably  describe  the  state  of 
the  system.  Typically,  the  information  or  knowledge  about  the  state  of  the  system 
and  its  applicable  rules  of  behavior,  in  spite  of  its  deficiencies,  is  a  major  source  of 
constraints  that  further  limit  the  extent  of  the  situations  that  must  be  considered. 
The  subset  of  possible  worlds  that  is  logically  consistent  with  this  evidence  is  called 
the  evidential  set ,  and,  in  one  form  or  another,  is  the  concern  of  every  approximate 
reasoning  approach.  In  any  approximate  reasoning  problem,  by  definition,  some  of 
these  evidential  worlds  are  such  that  a  hypothesis  is  true  in  some  of  them  and  false 
on  others,  as  depicted  in  Figure  1. 


1 

1 

|  |  Worlds  consistent  with  the  evidence  j  | 

Worlds  logically  inconsistent  with  the  evidence  [] 

■■■1 

HYPOTHESIS  TRUE 

HYPOTHESIS  FALSE 

Figure  1:  The  approximate  reasoning  problem 

The  view  of  approximate  reasoning  problems  that  is  afforded  by  this  possibie- 
world  perspective  also  simplifies  the  understanding  of  the  objective  of  approximate 
reasoning  approaches.  Lacking,  by  the  nature  of  the  problem,  the  ability  to  determine 
if  the  evidence  implies  whether  we  are  in  a  situation  where  a  hypothesis  is  true  or  in 
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one  where  it  is  false,  every  approximate  reasoning  methodology  seeks  answers  to  a 
different  problem:  that  of  describing  certain  properties  of  the  evidential  set. 

4  The  Semantics  of  Approximate  Reasoning 

Our  view  of  approximate  reasoning  methods  as  techniques  to  describe  the  evidential 
subset3  e  of  possible  worlds  that  are  consistent  with  available  information  now  allows 
a  more  detailed  look  into  their  philosophical  bases. 

Probabilistic  methods,  regardless  of  their  subjective  or  objective  semantics,  seek 
to  estimate  measures  of  the  subsets  of  the  evidential  set  where  a  hypothesis  h  is  true 
and  where  it  is  false,  i.e.,  the  values 

H(h  Ae)  and 

or  other  related  quantities,  such  as  likelihood  ratios  or  conditional  measures  with 
respect  to  the  evidential  set  t.  The  measure  /i  is,  however,  an  aggregate  measure  of 
set  extension  based  on  the  additive  law 

M  +  /*(«)  =  Mp  A  q)  +  n(p  V  q) , 

stating  that  its  value  over  a  set  may  be  derived  from  knowledge  of  its  value  over  a 
partition  of  nonintersecting  subsets.  Regardless  of  the  mechanism  used  to  derive  the 
weights  associated  with  individual  members  of  the  subsets,  it  shouid  be  clear  that 
interactions  and  associations  between  possible  worlds  (e.g.,  distances)  do  not  play 
any  role  in  such  quantities.  Simply  stated,  all  that  matter  are  the  weights  of  each 
individual  point  (more  generally,  each  atomic  subset)  that  are  then  added  to  gauge 
the  extent  of  the  subset. 

Possibilistic  methods,  on  the  other  hand,  are  based  on  notions  of  proximity  and 
resemblance  between  pairs  of  possible  worlds.  This  association  or  similarity  is  also  a 
measure,  albeit  not  one  that  may  be  expressed  in  terms  of  individual  weights.  Ex¬ 
ploiting  the  idea  that,  in  many  systems,  statements  that  are  true  in  certain  situations 
remain  approximately  true  in  similar  instances  (e.g.,  clothing  that  is  appropriate  when 
the  temperature  is  75°F  will  work  nearly  as  well  at  78°F),  the  purpose  of  possibilistic 
techniques  is  to  describe  the  evidential  set  in  terms  of  the  similarity  of  its  component 
possible  worlds  to  other  possible  worlds  used  as  reference  landmarks. 

The  basic  difference  between  probabilistic  and  possibilistic  methods,  therefore, 
goes  beyond  the  use  of  different  formulas  to  derive  truth  values.  The  methodologies 
are  based  on  different  conceptual  approaches  to  the  description  of  the  evidential  set; 

3For  simplicity,  we  refer  loosely  to  sets  and  propositions  are  if  they  were  the  same  objects. 
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Livy  stress,  In  probabilistic  reasoning,  relative  measures  of  set  size,  such  as  the  ratio  of 
previously  observed  true  and  false  cases,  while,  in  possibilistic  reasoning,  they  stress 
binary  measures  of  similarity  that  describe  how  far  is  any  conceivable  scenario  from 
certain  significant  situations. 

In  both  approaches,  however,  the  objective  is  the  description  of  properties  of  the 
evidential  set  rather  than  of  any  of  its  particular  members.  By  contrast,  certain 
nonmotonic  logic  techniques  such  as  circumcription  [24]  rely  on  methods  to  choose 
least-exceptional  worlds  in  the  evidential  set  by  extension  of  the  “close-world  as¬ 
sumption”  [30],  i.e.,  the  only  propositions  or  predicates  that  are  true  are  those  that 
are  known  to  be  true.  These  techniques  may  be  considered  general  procedures  to 
represent  states  of  evidential  knowledge  by  choice  of  prototypical  situations.  New 
evidence,  however,  may  force  retraction  of  some  of  the  assumptions  leading  to  the 
selection  of  other  evidential  worlds  as  prototypes.  Another  class  of  nonmonotonic 
reasoning  techniques,  while  generally  fitting  the  description  given  above,  relies  on 
prespecified  “default”  rules  [29]  to  control  the  choice  of  prototypical  worlds.  Since 
these  rules  are  usually  formulated  on  the  basis  of  plausibility  notions  rooted  on  sta¬ 
tistical  information  (as  in  the  famous  example  of  Tweety  and  the  flying  ability  of 
most  live  birds)  it  is  not  surprising  that  the  derivation  techniques  and  rules  of  these 
preferential  logics — a  name  indicating  their  definition  of  a  preferred  order  for  models 
of  a  situation — resemble  those  of  probabilistic  reasoning.  In  fact,  recent  developments 
strongly  point  to  the  existence  of  a  common  unifying  interpretation  for  both  [28,15]. 

4.1  Probabilistic  Reasoning 

There  can  be  little  argument  from  any  quarter  that  frequencies  of  occurrence  of  events 
satisfy  the  famous  additive  law  that  is  axiomatized  in  the  definition  of  set  measure  [1 7]. 
If  propositions  that  describe  event  occurrence  can  only  be  assigned  one  and  only  one 
of  the  classical  probability  values,  then  it  is  obvious  that  whenever  such  repetitive 
occurrences  are  counted,  then  the  sum  of  positive  and  negative  occurrences  must  add 
up  to  the  total  number  of  relevant  cases.  As  far  as  this  objectivist  interpretation 
of  probability  is  concerned,  therefore,  there  is  little  doubt  that  classical  formalisms 
provide  a  suitable  conceptual  tool  to  capture  the  behavior  of  systems  that  expresses 
itself,  as  experimentally  observed,  ir.  the  form  of  stable  frequency  values. 

Probabilities,  viewed  from  the  perspective  of  our  possible-worlds  model,  may  be 
considered  as  the  basis  of  methods  providing  answers  to  a  question  that  is  related  to 
but  different  from  the  undecidable  issue  of  the  validity  of  a  hypothesis.  Unable  to 
state,  because  of  lack  of  information,  that  h  is  either  true  or  false,  we  describe  instead 
the  behavior  of  the  system  in  the  long  run,  by  calculating  the  frequency  of  occurrence 
under  similar  circumstances. 
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Probabilistic  reasoning  schemes  may  be  generally  described  as  concerned  with  the 
computation  of  the  joint  probability  distribution  of  several  system  variables,  based 
on  knowledge  of  the  values  of  related  marginal  and  conditional  probability  distribu¬ 
tions.  Whenever  the  required  values  are  available  it  is  possible,  cone  ;ptually  at  least, 
to  derive  the  required  joint  distributions.  In  fact,  it  may  be  fairly  stated  that,  once 
it  was  understood  that  such  derivation  should  be  the  goal  of  probabilistic  reasoning 
systems,  the  attention  of  proponents  of  that  methodological  perspective  has  been  al¬ 
most  completely  directed  toward  the  development  of  methods  to  simplify  the  required 
knowledge  organization  and  manipulation  [27]. 

Substantial  concerns  arise,  however,  regarding  what  must  be  done  when  the  needed 
probability  values  are  not  known.  In  applied  science,  when  unknown  systems  and  phe¬ 
nomena  are  investigated,  experiments  are  designed  and  performed  to  determine  the 
basic  laws  of  system  behavior,  which  are  typically  expressed  through  quantitative 
relationships.  If,  based  on  such  knowledge,  rational  courses  of  action  are  chosen,  the 
careful  scientist  is  then  able  to  explain  and  justify  his  decisions  on  the  basis  of  a  strong 
epistemological  apparatus  supported  both  by  empirical  observation  and  by  rational 
deduction.  This  scheme,  which  proceeds  from  information  acquisition  to  decision 
making,  embodies  the  experimental  method  of  modern  science.  From  such  a  per¬ 
spective,  probabilistic  laws  describe  certain  aspects  of  system  behavior  described  by 
parameters  that  are  estimated  using  the  same  methods  that  are  universally  accepted 
and  employed  in  applied  science. 

Another  view  of  probability,  however,  regards  probability  values  as  expressions  of 
the  degree  of  belief  of  rational  decision  makers  regarding  the  validity  of  hypotheses. 
This  degree  of  belief  is  quantified  by  the  amount  of  money  that  a  rational  gambler 
is  willing  to  bet  in  a  gamble  where  the  payoff,  if  the  unknown  truth  value  turns  out 
to  be  true,  is  $1.  The  probabilistic  behavior  of  these  degrees  of  belief  is  justified 
by  a  number  of  axiomatic  systems  [6,35]  providing  formal  support  not  only  to  this 
subjectivist  interpretation  of  probability  but  also  to  a  decision-making  methodology 
based  on  the  maximization  of  expected  utility.  Related  axiomatic  formulations  have 
been  also  developed  to  suppo-t  the  contention  that  the  only  correct  procedure  for 
updating  such  beliefs  is  the  Bayes-Laplace  rule  [5]: 


Prob(g|p)  = 


Prob(p|</)  Prob(g) 
Prob(p) 


A  number  of  researchers  have  questioned,  in  the  past,  the  purportedly  rational 
nature  of  these  axiomatic  systems.  Their  misgivings,  which  we  share,  arise  both  from 
questions  about  the  rationality  of  some  specific  axioms,  as  noted  by  Suppes[42],  and 
from  observation  of  the  behavior  of  rational  decision-makers(including  developers  of 
the  axiomatic  formalisms)  that  contradicts  the  sure-thing  principle,  as  observed  by 
by  Allais  [1]  and  Ellsberg  [1 1].  Kyburg[21]  has  also  raised  substantial  concerns  about 
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the  epistemological  status  and  soundness  of  the  subjectivist  approach.  The  axiomatic 
system  of  Cox  has  also  been  criticized  for  its  assumption  that  beliefs  are  measured  by  a 
single  number  [10]  and,  again,  for  the  less-than-natural  character  of  some  axioms  [38], 

Proponents  of  this  stringent  orthodoxy  have  often  argued  that  behavior  departing 
from  their  theoretical  requirements,  however  prevalent,  is  actually  irrational.  Such  a 
claim,  however,  suffers  from  a  fundamental  methodological  flaw.  Rationality  should 
be  defined  in  terms  of  basic  requirements  that  demand  proper  consideration  of  two 
fundamental  factors:  observed  empirical  evidence  and  the  laws  of  logic.  By  requiring 
compliance  with  certain  basic  tenets  of  rational  behavior,  such  as  the  famous  avoid¬ 
ance  of  “dutch  books,”  subjectivist  schemes  certainly  attempt  to  meet  one  of  these 
requirements,  albeit  in  a  limited  fashion,  as  pointed  out  by  I\yburg[21].  By  defining 
rational  behavior  as  that  which  results  from  utilization  of  the  proponent’s  favorite 
scheme,  the  characterization  of  rationality  is  subjected  to  a  curious  argument  that 
inverts  the  identity  of  what  is  rational  with  what  must  be  done  to  ensure  rational 
behavior.  This  inversion  effectively  ensures  that  the  expected  utility  approach  would 
always  be  considered  to  be  rational:  in  fact,  if  any  other  behavior  is  observed,  it 
would  be,  by  definition,  irrational. 

This  inversion  of  premises  and  conclusions  is  also  apparent  in  other  arguments, 
based  on  pragmatic  necessity  considerations,  for  the  superiority  of  the  subjectivist 
approach.  If  decisions,  even  those  to  obtain  more  information,  must  be  made,  then 
the  elements  required  to  make  the  decision  (i.e..  utility  functions  and  degrees  of 
belief)  must  be  assessed.  Conversely,  any  decision  implies  that  such  values  have  been, 
whether  knowingly  or  not,  chosen  in  some  form  or  fashion.  As  a  result  of  this  close 
relation  between  the  assessment  of  situations  and  the  selection  of  suitable  courses  of 
action,  guaranteed  by  the  fact  that  values  of  expected  utilities  (i.e.,  numbers)  may 
always  be  totally  ordered,  it  is  claimed  that  the  subjectivist  approach  is  the  only 
one  among  approximate  reasoning  methods  that  has  a  rational  decision-theoretic 
apparatus. 

As  appealing  as  such  claims  may  be  to  some  decision-makers,  we  must  note  again 
a  curious  exchange  of  roles  in  the  scientific  discovery  process:  decisions  no  longer 
follow  from  empirical  observation  and  rational  cogitation:  rather,  parameters  that 
describe  knowledge  follow  from  a  practical  need  to  choose  suitable  actions.  However 
pressing  may  be  the  need  to  derive  decisions  it  should  be  clear  that,  in  the  absence  of 
information,  it  is  usually  impossible  to  determine  what  is  the  best  course  of  action. 
Any  randomizing  device  would,  under  such  circumstances,  provide  a  total  ordering  of 
possible  choices  but  there  is  very  little  to  assure  us  that  any  behavior  based  on  such 
arbitrary  basis  ought  to  be  called  rational. 

The  ultimate  goal  of  an  intelligent  system  is  to  take  actions  based  on  knowledge 
about  the  actual  rather  than  the  believed  behavior  of  a  real  world  system.  It  is 
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difficult  to  see  why,  as  noted  by  Kyburg  (22],  the  latter  should  be  given  much  attention 
outside  psychological  research.  If  applied  science  is,  as  generally  admitted,  a  rational 
enterprise  that  seeks  to  uncover  the  secrets  of  the  universe  and  to  provide  guidelines 
to  take  actions  based  on  such  knowledge,  then  it  is  clearly  desirable  that  intelligent 
agents,  in  their  quest  for  similar  objectives,  follow  as  closely  as  possible  the  essential 
procedures  of  the  scientific  method.  The  ability  to  produce  decisions  regardless  of 
the  extent  and  pertinence  of  available  knowledge  should  be  regarded  as  a  handicap 
rather  than  as  an  advantage  of  a  procedure:  a  fact  readily  noticed  by  those  engaged  in 
the  solution  of  important  real  life  problems  [12].  As  we  pointed  out  before,  whenever 
such  knowledge  is  acquired,  it  is  typically  reported  using  a  format  that  emphasizes 
the  quality  of  the  observational  method  and  the  strength  of  the  arguments  leading 
from  empirical  data  to  the  author’s  conclusions  rather  than  on  the  basis  of  personal 
confidence  expressed  by  willingness  to  take  gambling  risks. 

I  have  made  a  rather  long  exposition  about  the  dichotomy  between  subjectivist 
and  objectivist  approaches  to  probability  primarily  because  I  believe  this  to  be  a 
major  cause  of  a  controversy  that,  beyond  considerations  that  are  solely  germane  to 
probabilistic  reasoning,  extends  to  the  need  for  techniques  that  are  not  directly  based 
on  subjectivist  orthodoxy.  I  have  also  been  motivated  by  the  desire  to  clearly  expose 
a  personal  position  that  is  shared  by  many  in  the  approximate  reasoning  community 
but  that  is  also  often  misleadingly  described  as  being  antiprobabilistic. 

Far  from  being  antagonistic  to  one  approach  for  the  simple  sake  of  promoting  oth¬ 
ers,  my  eclectic  view  is  the  direct  result  of  practical  experience  with  the  development 
of  models  of  complex  systems,  and  of  close  familiarity  with  the  application  of  math¬ 
ematics  to  technological  problems.  Probability  is  indeed  a  powerful  tool  to  describe 
chance-related  aspects  of  the  behavior  of  real-world  systems.  Recent  contributions  of 
probabilists  and  decision  scientists,  within  and  without  the  context  of  Al,  such  as  the 
development  of  network-oriented  procedures  for  probabilistic  reasoning  [27].  are  most 
important  additions  to  our  methodological  arsenal. 

There  are,  however,  limitations  on  the  capabilities  of  any  tool,  whether  for  system 
analysis  or  for  any  other  purpose.  As  is  true  of  any  tool,  including  all  methodolo¬ 
gies  described  in  this  note,  the  applicability  of  probability  is  limited  by  its  inability 
to  perform  functions  that  lie  outside  its  scope,  and  by  practical  constraints  on  our 
ability  to  use  it  in  specific  situations.  In  spite  of  its  unquestionable  utility,  other  ap¬ 
proaches  also  play  a  significant  role  in  the  description  of  the  possible  state  of  affairs. 
These  techniques  must  not  be  considered  to  be  competitors  of  probability  but,  rather, 
complementary  techniques  to  enhance  the  understanding  of  the  real  world. 


10 


4.2  Generalized  Probabilistic  Reasoning 

Those  who  worry  about  the  potential  lack  of  applicability  of  techniques  based  on  con¬ 
ventional  probability  formalisms  do  not  question  the  conceptual  validity  of  probability 
as  the  appropriate  tool  to  measure  the  frequency  of  occurrence  of  diverse  events  under 
various  conditions  or,  in  some  cases,  the  strength  of  belief  of  decision-makers.  Con¬ 
cerns  about  the  problems  caused  by  ignorance  of  probability  values,  however,  have 
been  expressed  continuously  since  the  nineteenth  century  by  such  prominent  logi¬ 
cians  as  George  Boole  [3],  and  have  led  to  the  development  of  approaches  to  represent 
probabilistic  ignorance  by  using  subsets  of  possible  probability  values. 

If,  for  example,  the  probability  of  validity  of  a  proposition  p  is  unknown,  an 
interval  probability  method  will  represent  such  ignorance  by  assigning  the  interval 
[0, 1]  as  the  value  of  the  missing  probability.  If  it  is  known,  on  the  other  hand,  that 
an  event  has  better  than  even  chances  of  occurring,  such  knowledge  will  be  represented 
by  the  [0.5,  l]  interval.  More  generally,  probabilistic  knowledge  may  be  represented 
as  a  set  of  possible  probability  values  in  a  hyperdimensional  cube,  as  in  the  convex 
probabilities  approach  of  Kyburg[20]. 

The  corresponding  probabilistic  calculi  are  straightforward  conceptual  extensions 
of  the  classic,  number  based  calculus.  Such  extensions  produce,  for  example,  inter¬ 
vals  of  expected  utility  values  on  the  basis  of  knowledge  expressed  as  set  of  possible 
probability  values.  These  intervals  may  be  used,  in  many  instances,  to  rank  decisions 
in  the  same  way  that  such  choices  are  ordered  with  number-based  schemes.  When 
this  ordering  is  not  possible  (e.g..  overlapping  intervals  show  that  under  certain  sce- 
nav  o  A  is  preferrable  to  B,  while,  in  other  situations,  B  is  to  be  preferred),  the  lack 
of  a  clear  choice  does  not  imply  that  the  decision- theoretic  apparatus  is  defective. 
Rather,  the  methodology  is  rich  enough  to  tell  us  precisely  how  far  empirical  knowl¬ 
edge,  combined  with  the  laws  of  rational  thought,  can  take  us.  If.  beyond  that  point, 
it  is  imperative  to  do  something — a  rather  unfortunate  set  of  events — any  selection 
scheme,  from  that  point  on,  will  be  as  rational  as  any  other  (i.e..  very  little). 

Although  the  manipulation  of  intervals  and  sets  of  possible  probability  values  al¬ 
leviates  some  conceptual  worries,  it  hardly  helps  in  terms  of  the  ability  to  perform 
the  required  computations.  The  situation,  unfortunately,  is  made  worse  by  the  need 
to  represent  and  manipulate  probability  bounds  for  subsets  without  the  simplifying 
help  that  additivity  provides  for  actual  probability  values.  This  unfortunate  state 
of  affairs  is  the  primary  reason  for  the  popularity  that  an  approach — capable  of  be¬ 
ing  interpreted  in  terms  of  interval  probabilities—  enjoys  today  as  one  of  the  major 
methodologies  of  approximate  reasoning.  This  approach  is  the  Dempster- Shafer  cal¬ 
culus  of  evidence. 

Originally  developed  by  Dempster  [7]  in  the  context  of  statistical  studies,  the  ap- 
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proach  was  further  developed  by  Shafer  [36]  as  a  non-Bayesian  alternative  to  the 
representation  and  manipulation  of  degrees  of  belief.  Recently  [32],  application  of 
possible- world  semantic  models  to  the  interpretation  of  its  major  structures  has  shown 
that  the  approach  is  fully  consistent  with  the  classical  calculus  of  probability,  includ¬ 
ing  the  Bayes- Laplace  formula.  Smets  [40]  has  also  recently  reviewed  the  structures 
of  the  calculus  of  evidence  proposing,  in  addition,  unconventional  extensions  based 
on  a  uonprobabilistic  concept  of  belief. 

The  calculus  of  evidence  may  be  readily  understood  using  our  basic  model  if  it  is 
recalled  that,  whenever  assessing  the  validity  of  a  hypothesis  on  the  basis  of  emprical 
knowledge,  there  are  three  possible  logical  outcomes  of  any  reasoning  process:  the 
hypothesis  may  be  proved  to  be  true,  the  hypothesis  may  be  proved  to  be  false,  or 
the  information  may  be  insufficient  to  make  either  of  those  conclusions. 

If  the  notation  K p  is  used  to  denote  the  set  of  situations,  i.e.,  possible  worlds, 
where  p  can  be  proved  true,  if  K ->p  correspondingly  denotes  those  cases  where  it 
can  be  proved  false,  and  if  Ip  denotes  the  set  of  situations  where  the  truth  value  of 
p  cannot  be  established  without  ambiguity,  then  it  is  obvious  that  any  probability 
function  Prob(-)  will  satisfy  the  equation 

Prob(Kp)  +  Prob(K-ip)  +  Prob(Ip)  =  1 . 

Furthermore,  since  the  probability  of  Ip  may  be  positive,  it  will  be  true,  in  general, 
that 

Prob(Kp)  -f  Prob(K-ip)  <1. 

The  calculus  of  evidence  is  based  on  the  representation  of  the  probabilistic  in¬ 
formation  conveyed  by  evidence  by  means  of  belief  functions.  These  functions  may 
be  readily  interpreted  in  terms  of  the  above  probabilities  of  provability  through  the 
equation 

Bel(p)  =  Prob(Kp) . 

More  importantly,  these  belief  functions  are  usually  expressible  in  a  compact  form  by 
means  of  basic  probability  assignments  or  mass  functions.  These  functions  m,  which 
are  also  defined  over  propositions,  are  related  to  belief  functions  by  the  equation 

Bel(p)  =  ]T  m(q) . 

q=>p 

The  ability  to  represent  and  manipulate  probability  intervals  by  means  of  mass  func¬ 
tions  is  the  major  reason  for  the  appeal  of  the  Dempster-Shafer  methodology. 

Although,  in  a  typical  decision  problem,  we  are  interested  in  the  truth  of  p  rather 
than  its  provability,  lack  of  adequate  information  precludes  determination  of  the  prob¬ 
ability  of  such  truth.  In  general,  however,  it  may  be  said  that 

Bel(p)  <  Prob(p)  <  1  -  Bel(-<p) . 
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Furthermore,  these  bounds  cannot  be  improved. 

This  interpretation  of  the  Dempster-Shafer  calculus  as  concerned  with  probabili¬ 
ties  of  provability,  as  called  by  Pearl  [27],  was  first  formalized  by  the  author  using  a 
possible-worlds  model  based  on  the  use  of  a  modal  logic  called  epistemic  logic.  The 
formal  system,  which  is  equivalent  to  the  modal  system  S5  [19]  used  by  Moore  [25]  in 
his  pioneer  work  on  the  application  of  modal  logic  concepts  to  artificial  intelligence 
problems,  is  enhanced  by  consideration  of  probability  distributions  over  the  set  of 
possible  worlds.  In  particular,  the  unary  operator  K  represents  the  knowledge  of  a 
rational  agent  to  prove  that  a  proposition  may  be  known  or  proved  to  be  true. 

The  probability  of  the  set  of  all  possible  worlds  where  a  proposition  p  is  the  most 
specific  proposition  that  is  known  to  be  true,  called  the  epistemic  set ,  corresponds  to 
the  values  of  the  mass  function.  In  any  possible  world,  this  most  specific  knowledge  is 
the  conjunction  of  all  propositions  that  are  known  to  be  true  in  that  possible  world. 

The  semantic  model  of  the  Dempster-Shafer  theory  also  validates  the  so-called 
Dempster’s  rule  of  combination,  which  permits  the  combination  of  belief  and  mass 
functions  corresponding  to  evidential  observations  made  under  certain  conditions  of 
independence.  When  such  conditions  are  not  valid,  use  of  this  formula  leads,  of 
course,  to  erroneous  results,  often,  although  incorrectly,  considered  to  be  an  essen¬ 
tial  handicap  of  the  evidential  reasoning  approach,  rather  than  a  consequence  of  its 
misapplication. 

From  our  perspective  the  only  substantial  example  of  such  misapplication  is  that 
which  results  from  improper  use  of  the  Dempster’s  rule  of  conditioning,  i.e.,  a  par¬ 
ticular  use  of  the  rule  of  combination  that  is  valid  only  under  special  circumstances, 
as  a  substitute  for  Bayes'  rule.  Certain  methodological  limitations  of  the  calculus  of 
evidence,  notably  the  lack  of  methods  to  handle  with  sufficient  generality  the  coun¬ 
terparts  of  conventional  conditional  probabilities,  are  more  worrisome,  in  our  opinion, 
than  any  distress  arising  from  its  misuse  or  its  supposed  lack  of  a  decision-making 
apparatus. 


4.3  Possibilistic  Reasoning 

Our  basic  semantic  model  also  provides  straightforward  interpretations  [33]  for  the 
major  concepts  and  structures  of  possibility  theory  [46,9]:  an  approach  to  approxi¬ 
mate  reasoning  derived  from  multivalued  logics  [31]  and  the  theory  of  fuzzy  sets  [45]. 
The  major  formal  tool  that  enhances  our  understanding  of  such  structures  is  not  a 
probabilistic  measure  of  set  size  but,  rather,  a  binary  measure  ol  proximity  or  dis¬ 
tance,  called  a  similarity  relation. 

Similarity  considerations  play  a  major  role  in  human  cognitive  processes  [44].  In- 
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formally,  all  such  analogical  processes  are  based  on  the  notion  that  the  validity  of 
some  propositions  in  a  given  situation  extends  also  to  other  situations  where  the 
same  basic  conditions  are  prevalent. 

In  our  model  of  possibilistic  structures,  the  similarity  between  states  of  affairs  is 
expressed  by  a  function  that  assigns  a  number  between  0  and  1  to  every  pair  of  possible 
worlds.  The  value  of  that  function  S(w,  w1)  for  a  pair  of  possible  worlds  quantifies  the 
extent  of  resemblance  between  pairs  of  situations  or  scenarios,  as  evaluated  from  the 
viewpoint  of  the  particular  problem  being  considered.  In  a  decision-making  problem, 
for  example,  the  decision  maker  may  define  such  measures  to  describe  the  extent  by 
which  the  consequences  of  certain  decisions  resemble  desirable  goals  or  objectives. 

The  highest  similarity  value,  1,  indicates  that,  from  the  perspective  of  the  system 
being  studied,  both  situations  are  indistinguishable.  The  lowest  value,  0,  indicates 
that  knowlege  of  what  is  true  in  one  possible  world  does  not  help  to  derive  what  is 
true  in  the  other. 

Similarity  scales  are  the  measurement  sticks  used  to  describe  the  extent  by  which 
certain  results  may  be  extrapolated  from  one  possible  world  to  another.  Unlike  proba¬ 
bility  functions,  which  correspond  to  either  measurable  properties  of  physical  systems 
or  states  of  belief  of  rational  agents,  the  similarity  relations  simply  provide  a  mecha¬ 
nism  to  describe  resemblance  between  states  of  affairs. 

Similarity  relations  may  also  be  regarded  as  generalizations  of  the  modal-logic 
notion  of  accessibility  or  conceivability  [19]  by  introduction  of  multiple  binary  relations 
Ra  between  possible  worlds  (one  for  each  value  of  a  between  0  and  1),  defined  by 

R0{w,w')  if  and  only  if  S(w,w')  >  a. 

These  relations  also  justify  the  use  of  a  possibilistic  terminology  that  regards  proposi¬ 
tions  as  being  possible  to  some  degree,  thereby  generalizing  the  classical  definition  of 
the  modal  operator  for  possible  truth  in  a  manner  similar  to  that  used  by  Lewis  [23] 
in  his  treatment  of  counterfactual  statements. 

Certain  requirements  must  be  imposed  to  assure  that  similarity  functions  truly 
represent  notions  of  resemblance  between  possible  situations.  Similarities  between 
identical  scenarios,  for  example,  should  have  a  value  of  1,  the  highest  possible  value. 
Furthermore,  if  two  different  possible  worlds  are  to  be  distinguished  by  means  of 
similarity  values,  then  it  also  makes  sense  to  require  that  their  similarity  be  strictly 
less  than  1.  It  is  likewise  natural  to  require  that  the  similarity  between  two  particular 
scenarios  be  a  symmetric  function,  i.e.,  w  resembles  w'  as  much  as  w'  resembles  w. 

Beyond  these  properties  of  reflexivity  and  symmetry,  it  is  also  necessary  to  require 
that  similarities  satisfy  a  generalized  form  of  transivity.  If,  given  three  possible  worlds 
w ,  w'  and  w ",  the  worlds  w  and  w'  are  highly  similar  while  w'  and  w"  are  also  highly 
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similar,  it  will  be  unreasonable  to  say  that  w  and  w"  may  be  highly  dissimilar.  The 
value  of  S(iv,w")  must,  therefore,  be  bounded  by  below  by  a  function  of  S(w,  w')  and 
S(w',w"),  as  expressed  by  the  condition 

S(w,  w")  >  S{ui ,  w')  ©  S{w',  w ") , 

which  uses  the  binary  operation  ©  to  denote  the  required  function. 

If  certain  reasonable  requirements  are  imposed  upon  the  function  ©,  it  is  easy 
to  see  that  this  function  has  the  properties  of  triangular  norms ,  which  are  usually 
introduced  in  multivalued  logics  [43]  to  relate  the  truth  value  of  a  conjunction  p  A  q 
to  the  degrees  of  truth  of  p  and  q.  These  functions  are  motivated,  in  our  model,  by 
considerations  that  are  related  solely  to  metric  concepts  of  proximity  and  resemblance. 
Important  examples  of  triangular  norms  are  given  by  the  functions 

a  ©6  =  min(a,  b) ,  a  ©6  =  max(a  +  b  —  1.0) ,  and  a©fe=a&, 

called  the  Zadeh ,  Lukashwic c,  and  product  triangular  norms,  respectively. 

Similarity  functions  are  trivially  related  by  the  relation 

<5  =  1-5, 

to  functions  <5  that  have  the  properties  of  a  distance  or  metric  function.  In  the 
particular  case  where  ©  is  the  triangular  norm  of  Lukasiewicz,  then  6  is  an  ordinary 
metric  or  distance,  which  obeys  the  well-known  triangular  inequality 

6(u\w")  <  6(w,w')  +  6[w',w") . 

If  ©  is  the  Zadeh  triangular  norm,  on  the  other  hand,  the  transitivity  property  is 
equivalent  to  the  stronger  ultvametric  inequality 

6(w,  w")  <  max  ( <5(u>,  w'),  S(w',  w") ) . 

The  structures  introduced  by  similarity  relations  may  be  readily  applied  to  gen¬ 
eralize  the  subset  inclusion  relations  that  are  the  fundamental  basis  of  deductive 
reasoning.  These  inclusion  relations  are  typically  expressed  by  conditional  proposi¬ 
tions  of  the  form  “  If  q ,  then  p."  stating  that  any  state  of  affairs  where  q  is  true  is  such 
that  p  is  also  true.  These  conditional  propositions,  which  permit  the  derivation  of 
true  propositions  from  knowledge  of  the  truth  of  others  by  means  of  the  rule  of  modus 
ponens,  may  be  also  stated  using  similarity  structures  by  saying  that  any  g-world  has 
a  p-worid  (j.e.,  itself)  that  is  as  similar  as  possible  to  it. 

The  ability  to  characterize  proximity  between  possible  worlds  using  a  continuous 
scale  of  similarity  provides  for  a  more  general  characterization  of  the  inclusion  rela¬ 
tions  that  hold  between  subsets  of  possible  worlds  (i.e.,  propositions).  If  the  subset 
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of  9-worlds  is  not  included  in  that  of  p-worlds,  we  may,  however,  use  the  similarity 
structure  to  quantify  the  amount  of  stretching  required  to  reach  a  p-world  from  any 
9-world.  The  degree  of  implication  function  defined  by  the  expression 

I(p|9)=  inf  sup  S(w,u/) , 

to'\-q  whp 

which  is  related  to  the  well-known  Hausdorff  distance,  provides  such  quantification 
as  the  size  of  the  topological  neighborhood  of  p  that  encloses  9,  as  shown  in  Figure  2. 


The  ability  to  express  relationships  between  neighborhoods  of  different  sets  of 
possible  worlds  or,  equivalently,  between  propositions  permits  the  generalization  of 
the  modus  ponens  by  use  of  the  transitive  property  of  the  degree  of  implication 
function: 

I(p|r)>I(p|9)«.I(9|r). 

illustrated  in  Figure  3. 


The  generalized  modus  ponens  rule  of  Zadeh  [46]  is  expressed  by  means  of  pos¬ 
sibility  distributions,  which  are  themselves  defined  in  terms  of  similarities  between 
evidential  worlds  and  those  satisfying  a  given  proposition  p  [33].  From  the  viewpoint 
of  our  similarity- based  model,  the  generalized  modus  ponens  may  be  thought  of  as 
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a  sound  rule  of  logical  extrapolation  that  exploits  similarities  between  conceivable 
scenarios  or  situations.  The  fundamental  topological  structures  that  permit  this  type 
of  reasoning  are  clearly  different  in  character  and  nature  than  the  measures  of  set 
extension  that  are  the  conceptual  basis  of  probabilistic  reasoning. 

In  closing,  it  is  important  to  mention  that  posibilistic  reasoning  based  on  fuzzy 
logic  has  led  recently  to  the  implementation  of  a  large  number  of  successful  commercial 
products  [41].  These  systems,  which  have  primarily  exploited  the  applicability  of  the 
technology  to  a  variety  of  control  devices,  provide  a  clear  indication  of  the  usefulness 
of  these  ideas,  which  now  also  rest  on  clearly  understandable  theoretical  foundations. 

5  Looking  ahead 

The  ability  to  explain  the  role  and  utility  of  the  major  approximate  reasoning  ap¬ 
proaches  by  use  of  a  unifying  framework  provides  the  rational  basis  to  resolve  most 
of  the  issues  about  relative  importance  and  necessity.  Rather  than  supporting  any 
partisan  contention  about  the  superiority  of  one  methodology  over  the  others,  this 
framework  shows  instead  that  a  variety  of  tools  are  needed  to  produce  effective  de¬ 
scriptions  of  evidence  and  its  implications. 

Each  methodology  may  play  a  significant  role  in  every  potential  application  of 
approximate  reasoning  techniques:  a  role  that  complements  rather  than  substitutes 
forf  other  procedures.  In  the  absence  of  compelling  theoretical  arguments  for  rejecting 
any  approximate  reasoning  position  and  in  the  presence  of  substantial  solid  evidence 
of  their  usefulness  and  applicability,  it  is  irrational  to  maintain  positions  that  are 
needlesly  divisive  and  polemic. 

Recent  investigations  showing  that  there  exist  substantial  functional  rather  than 
conceptual  similarities  between  the  network-oriented  methods  of  conventional  prob¬ 
abilistic  schemes  and  the  calculus  of  evidence  [37],  and  indicating  that  fuzzy-set  con¬ 
cepts  and  multivalued  logic  may  be  successfully  blended  to  represent  vague  knowledge 
about  probabilities  [2].  clearly  point  the  way  toward  a  more  productive  research  col¬ 
laboration  between  approximate  reasoning  specialists. 

This  collaboration  should  stress  application  of  all  valid  concepts  to  the  solution  of 
practical  problems  rather  than  further  continuation  of  the  controversy  about  techno¬ 
logical  superiority  or  necessity..  In  particular,  the  example  set  by  Japanese  researchers 
in  the  development  of  a  large  number  of  commercial  products  of  evident  applicability 
illuminates  the  path  that  must  be  followed.  The  future  lies  in  the  solution  of  practi¬ 
cal  problems,  both  because  of  the  direct  importance  of  those  problems,  and  because 
conceptual  developments  and  clarifications  usually  follow,  as  is  the  case  of  the  work 
discussed  in  this  note,  from  the  experiences  gained  producing  such  solutions.  Having 
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established  needed  conceptual  bases  to  clarify  controversial  issues,  we  hope  it  is  clear 
that  this  is  the  time  to  apply  ideas  rather  than  to  continue  to  argue  about  them. 
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Abstract 


This  note  presents  a  formal  semantic  character:  :ation  of  the  major  concepts  and  constructs  of 
fuzzy  logic  in  terms  of  notions  of  distance,  closeness,  and  similarity  between  pairs  of  possible  worlds. 
The  formalism  is  a  direct  extension  (by  recognition  of  multiple  degrees  of  accessibility,  conceivability, 
or  reachability)  of  the  major  modal  logic  concepts  of  possible  and  necessary  truth. 

Given  a  function  that  maps  pairs  of  possible  worlds  into  a  number  between  0  and  1,  generalizing 
the  conventional  concept  of  an  equivalence  relation,  the  major  constructs  of  fuzzy  logic  (i.e.,  condi¬ 
tioned  and  unconditional  possibility  distributions)  are  defined  in  terms  of  this  generalized  similarity 
relation  using  familiar  concepts  from  the  mathematical  theory  of  metric  spaces.  This  interpretation 
is  different  in  nature  and  character  from  the  typical,  chance-oriented,  meanings  associated  with  prob¬ 
abilistic  concepts,  which  are  grounded  on  the  mathematical  notion  of  set  measure.  The  similarity 
structure  defines  a  topological  notion  of  continuity  in  the  space  of  possible  worlds  (and  in  that  of  its 
subsets,  i.e.,  propositions)  that  allows  a  form  of  logical  “extrapolation”  between  possible  worlds. 

This  logical  extrapolation  operation  corresponds  to  the  major  deductive  rule  of  fuzzy  logic 
— the  compositional  rule  of  inference  or  generalized  modus  ponens  of  Zadeh — an  inferential  opera¬ 
tion  that  generalizes  its  classical  counterpart  by  virtue  of  its  ability  to  be  utilized  when  propositions 
representing  available  evidence  only  match  approximately  the  antecedents  of  conditional  proposi¬ 
tions.  The  relations  between  the  similarity-based  interpretation  of  the  role  of  conditional  possibility 
distributions  and  the  approximate  inferential  procedures  of  Baldwin  are  also  discussed. 

A  straightforward  extension  of  the  theory  to  the  case  where  the  similarity  scale  is  symbolic 
rather  than  numeric  is  described.  The  problem  of  generating  similarity  functions  from  a  given  set  of 
possibility  distributions,  with  the  latter  interpreted  as  defining  a  number  of  (graded)  discernibility 
relations  and  the  former  as  the  result  of  combining  them  into  a  joint  measure  of  distinguishability 
between  possible  worlds,  is  briefly  discussed. 
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1  INTRODUCTION 


This  note  presents  a  semantic  characterization  of  the  major  concepts  and  constructs  of  fuzzy  logic 
in  terms  of  notions  of  similarity,  closeness,  and  proximity  between  possible  states  of  a  system  that 
is  being  reasoned  about.  Informally,  a  “possible  state"  (to  be  formalized  later  using  the  notion  of 
“possible  world”)  is  an  assignment  of  a  well-defined  truth-value  (i.e.,  either  true  or  false)  to  all 
relevant  declarative  knowledge  statements  about  that  system. 

The  primary  goal  that  guided  the  research  leading  to  the  results  presented  in  this  work  has  been 
one  of  conceptual  clarification.  A  great  deal  of  energy  has  been  directed  in  past  few  years  to  debating 
the  methodological  necessity  and  relative  merits  of  various  approximate  reasoning  methodologies.  As 
a  result  of  these  exchanges,  the  need  to  consider  certain  nonclassical  approaches,  has  been  questioned 
on  a  variety  of  bases. 

Recognizing  the  need  for  the  development  of  sound  semantic  formalisms  that  shed  light  on  the 
nature  of  different  approaches,  the  author  has  pursued,  in  the  past  few  years,  a  line  of  theoretical 
research  seeking  to  describe  various  approximate  reasoning  methodologies  using  a  common  frame¬ 
work.  These  investigations  have  recently  shown  the  close  connection  between  the  Dempster-Shafer 
calculus  of  evidence  [35]  and  epistemic  logics.  This  relationship  was  elucidated  by  straightforward 
application  of  conventional  probabilistic  concepts  to  models  of  knowledge-states  that  distinguish 
between  the  truth  of  a  proposition  and  knowledge  (by  rational  agents)  of  that  truth.  Central  to 
this  development  is  the  notion  of  “possible  world"  used  by  Carnap  [6]  to  develop  logical  bases  for 
probability  theory. 

The  same  central  notion  of  possible  state  of  affairs  is  also  the  conceptual  basis  of  the  results 
presented  in  this  note,  which  is  aimed  at  establishing  the  semantic  bases  of  possibilistic  logic  with 
emphasis  on  the  study  of  its  possible  relations  and  differences,  if  any,  with  probabilistic  reasoning. 

The  results  of  this  investigation  clearly  show  that  possibilistic  logic  can  be  interpreted  in  terms 
of  nonprobabilistic  concepts  that  are  related  to  the  notions  of  continuity  and  proximity.  The  major 
functional  structures  of  fuzzy  logic,  i.e.,  possibility  and  necessity  distributions,1  may  be  defined  in 
terms  of  the  more  primitive  notion  of  similarity  between  possible  states  of  a  system  using  constructs 
that  are  the  direct  extension  of  well-known  concepts  in  the  theory  of  metric  spaces.  The  topological 
metric  structure  that  is  so  defined  may  be  used  to  derive  a  sound  inferential  rule  that  is  a  form 
of  logical  “extrapolation.”  This  rule  is  also  shown  to  be  the  compositional  rule  of  inference  or 
generalized  modus  ponenB  proposed  by  Zadeh  [53].  Conversely,  possibility  distributions — expressing 
resemblance  from  some  specific  regard — may  be  used  to  derive  the  actual  similarity  functions — 
discerning  between  possible  worlds  from  the  joint  viewpoint  of  several  respects. 

The  constructs  that  are  used  to  derive  the  interpretation  presented  in  this  note  are  formally, 
structurally,  and  conceptually  different  from  those  that  explain  probabilistic  reasoning,  in  either 
its  objective  or  subjective  interpretations,  irrespective  of  methodological  reliance  on  interval-based 
approaches  to  represent  ignorance.  The  latter  class  of  methods — measuring  the  relative  proportion 

1  It  is  important  to  remark  that  the  icope  of  this  work  it  limited  to  the  moat  fundamental  concepts  and  construct* 
of  fuzzy  lope  without  examining  related  notions  such  as,  for  example,  generalized  quantifiers. 
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of  (either  observed  or  believed)  occurrence  of  some  event — are  based  on  the  mathematical  notion  of 
set  measure,  while  the  former — seeking  to  establish  similarities  between  situations  that  may  be  used 
for  analogical  reasoning — are  related  to  the  theory  of  distances  and  metric  spaces. 

This  presentation  of  the  relationships  between  similarity-based  concepts  and  possibilistic  notions, 
while  grounded  on  a  formal  treatment  that  is  based  on  rigorous  logical  and  mathematical  formalisms, 
will  be  kept  at  a  level  that  is  as  informal  as  possible.  The  purpose  of  this  presentation  style  is 
to  facilitate  comprehension  of  major  ideas  without  the  clutter  that  would  need  to  be  otherwise 
introduced  to  keep  matters  strictly  precise.  For  this  reason,  we  will  refrain  from  formal  introduction 
of  structures  and  axiom  schemata,  that,  although  correct  and  proper,  may  encumber  understanding 
of  the  basic  concepts. 

Before  we  proceed  to  the  detailed  consideration  of  semantic  models,  I  must  briefly  remark  on 
the  epistemological  implication  of  these  developments.  The  present  interpretation  is  not  claimed 
to  be  the  only  one  that  may  be  advanced  to  define  the  notion  of  possibility  in  terms  of  simpler 
concepts,  nor  do  I  claim  that  it  may  not  be  sometimes  possible,  even  desirable,  to  model  possibilistic 
structures  from  other  bases.  My  intent  is  not  to  prove  the  conceptual  superiority  of  one  approach 
over  another  or  to  argue  about  the  relative  utility  of  different  technologies.  Rather,  I  hope  that  these 
results  have  contributed  to  establish  the  basic  conceptual  differences  to  the  treatment  of  imprecise 
and  uncertain  information  that  are  inherent  in  probabilistic  and  possibilistic  methods;  the  former 
oriented  toward  quantifying  believed  or  measured  frequency  of  occurrence,  and  the  latter  seeking  to 
determine  propositions — implied  by  the  evidence — that  are  similar,  in  some  sense,  to  a  hypothesis 
of  interest.  In  other  words,  beyond  accidental  domain-specific  relations,  both  types  of  methods  are 
needed  to  analyse  and  clarify  the  significance  of  imprecise  and  uncertain  information. 
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2  APPROXIMATE  REASONING  AND  POSSIBLE  WORLDS 


Our  point  of  departure  is  the  model-theoretic  formalisms  of  modal  logics.  Let  us  assume  that 
declarative  statements  about  the  state,  situation,  or  behavior  of  a  real-world  system  under  study 
are  symbolically  represented  by  the  letters  of  some  alphabet 

{P.9, 

which  are  combined  in  the  customary  way  using  the  logical  operators  V,A,  — ♦  and  *-*  (to  be 
interpreted  with  their  usual  meanings)  to  derive  a  language  if  (i.e.,  a  collection  of  sentences). 
Furthermore,  we  augment  this  language  by  use  of  two  unary  operators  N  and  II,  called  the  ne¬ 
cessity  and  possibility  operators,  respectively,  having  usage  governed  by  the  rule 

If  tf>  is  a  sentence,  then  N^>  and  II <t>  are  also  sentences, 
introducing  the  ability  to  represent  different  modalities  for  the  truth  of  propositions. 

A  model  for  this  propositional  system  is  a  structure  consisting  of  three  components: 

1.  A  nonempty  set  of  possible  worlds  U  introduced  to  represent  states,  situations,  or  behaviors 
of  the  system  being  modeled  by  our  sentences.  In  what  follows  we  will  refer  to  this  set  as  the 
universe  of  discourse,  or  universe,  for  short. 

We  will  also  need  to  consider  a  nonempty  subset  if  of  the  universe  U,  which  is  introduced 
to  model  the  set  of  conceivable  worlds  that  are  consistent  with  observed  evidence.  This  set 
(possibly  equal  to  the  whole  universe  U)  will  be  called  the  evidential  Bet.  Throughout  this 
note,  we  will  assume  that  evidence  about  the  world  is  always  given  by  means  of  conventional 
propositions  that  allow  to  determine,  without  ambiguity,  whether  a  possible  world  either  is  or 
is  not  a  member  of  the  evidential  set.2 

2.  A  function  (called  a  valuation)  that  assigns  one  and  only  one  of  the  truth  values  true  or  false 
to  every  possible  world  w  in  the  universe  U  and  every  sentence  <f>  in  the  language.  Assignment 
of  the  truth-value  true  to  a  pur  (tu,^)  will  be  denoted  w\~4>  (i.e.,  <f>  is  true  in  the  world  w). 

In  what  follows,  we  will  use  the  same  symbols  to  describe  subsets  of  possible  worlds  and  the 
propositions  that  are  true  only  in  worlds  that  are  members  of  such  subsets.  For  example,  the 
symbol  if  will  be  used  to  denote  both  the  evidential  set  and  the  proposition  that  asserts  the 
validity  of  the  corresponding  evidential  observations.  Using  this  notation,  for  example,  we 
will  write  u>H  if  to  indicate  that  the  world  w  is  compatible  (i.e.,  logically  consistent)  with  the 
evidence  if. 

Furthermore,  we  will  use  the  symbol  if,  introduced  above  as  a  set  of  well-formed  sentences, 
to  denote  also  the  power  set  of  the  universe  U.  Rigorously,  subsets  of  it  strictly  correspond 
to  the  classes  of  equivalence  of  the  sentence  set  if  that  are  obtained  by  equating  logically 

equivalent  sentences.  In  the  same  simplifying  vein,  we  will  drop  also  the  customary  distinction 
. —  .  -  1  ■ 

2For  the  Mice  of  simplicity,  fussy  evidential  facts  such  as  “Tom  is  rich,”  usually  considered  in  fussy  lope,  will  not 
be  treated  in  this  note.  The  meaning  of  such  assertions  will  be  discussed  in  a  forthcoming  paper. 
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between  sentences — the  linguistic  expressions  of  something  that  may  be  true  or  false — and 
propositions — the  actual  things  being  asserted. 

3.  A  binary  relation  R,  between  possible  worlds,  called  the  accessibility,  conceivability,  or  reach¬ 
ability  relation,  its  traduced  to  model  the  semantic  of  the  modal  operators  N  and  II. 

It  is  not  necessary  to  review  here  the  well-known  axioms  [21]  that  restrict  the  assignment  of 
truth  values  to  well-formed  sentences  according  to  the  rules  oi  propositional  logic.  To  facilitate 
comprehension  of  our  formalism,  we  need  to  recall  solely  the  rules  that  constrain  assignment  of 
truth  values  to  sentences  formed  by  prefixing  other  valid  expressions  with  the  modal  operators,  i.e., 

1.  The  sentence  4  U  necessarily  true  in  the  possible  world  w  (i.e.,  wHN^)  if  and  only  if  it  is  true 
in  every  world  w'  that  is  related  to  the  world  w  by  the  relation  R. 

2.  The  sentence  <f>  is  possibly  true  in  the  possible  world  w  (i.e.,  tvhll^)  if  and  only  if  it  is  true 
in  some  world  w'  that  is  related  to  the  world  w  by  the  relation  R. 

If,  for  example,  the  r>  -\tion  R  relates  worlds  that  share  the  same  (possibly  empty)  subset  of  true 
sentences  of  the  prespecified  set  of  expressions 

S—  i 

i.e.,  R(w,w')  if  and  only  if  any  sentence  <f>  in  ^  is  either  true  in  both  w  and  w'  or  it  is  false  in  both 
w  and  w',  then  the  resulting  system  has  an  “epistemic”  interpretation  that  regards  related  possible 
worlds  as  “being  possible  for  all  we  know”  (i.e.,  observed  evidence,  corresponding  to  a  subset  of 
S  is  the  same  for  both  worlds).  In  this  case,  the  necessity  operator  N  corresponds  the  epistemic 
operator  K  of  epistemic  logics,  with  the  corresponding  system  having  the  properties  of  the  modal 
system  S5,  which  was  used— in  the  context  of  probability  theory— as  the  semantic  basis  for  the 
Dempster-Shafer  calculus  of  evidence  [35]. 

If,  on  the  other  hand,  the  original  interpretation  of  logical  necessity — corresponding  to  a  relation 
R  that  is  equal  to  U  x  U,  i.e.,  that  relates  every  pair  of  possible  worlds — is  given  to  the  operator  N, 
then  a  proposition  is  necessarily  true  if  and  only  if  it  is  true  in  every  possible  world. 

If  the  relation  R  is  chosen  as 

R—if  x  IT  , 

then  this  interpretation  may  be  used  to  characterize  approximate  reasoning  problems  as  those  where 
a  hypothesis  of  interest  is  neither  necessarily  true  nor  necessarily  false  in  worlds  in  the  evidential 
set  ST,  reflecting  the  inability  of  conventional  deductive  techniques  to  unambiguously  determine  the 
truth-value  of  the  hypothesis.9 

In  those  problems,  in  spite  of  this  fundamental  impossibility,  we  may  resort  to  approximate  rea¬ 
soning  methods  to  describe  various  properties  of  the  evidential  set  if.  For  example,  the  probabilistic 
structures  utilized  by  various  probabilistic  reasoning  approaches  typically  characterize  relations  of 
the  form 

/i(//A?):/i(-.//A?), 

between  the  “measures”  of  the  subsets  of  the  evidential  set  if  where  a  hypothesis  H  is  true  or  false, 
respectively. 

3 The  notion  of  approximate  reasoning  problem  is  often  extended  to  encompass  situations  where  deductive  tech¬ 
niques  cannot  always  be  used  because  of  practical  limitations  on  computational  resources. 
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Our  aim  will  be  to  study  how  other  structures,  defining  a  metric  or  distance  in  the  universe  U, 
may  be  used  to  describe  the  nature  of  the  evidential  set.  To  do  so,  we  will  assign  a  different  meaning 
to  the  accessibility  relation,  giving  it  an  interpretation  that  regards  related  worlds  as  “similar”  or 
“close”  in  some  sense.  We  will  require,  however,  a  scheme  that  is  richer  than  that  provided  by  a 
single  relation  so  that  we  can  extend  modal  notions  and  derive  semantics  bases  for  fuzzy  logic,  which 
relies  on  concepts  of  degrees  of  matching  or  closeness  expressed  by  real  numbers  between  0  and  1. 

In  what  follows  we  will  use  the  symbols  =>  and  O  to  denote  strong  implication  and  equivalence, 
respectively.  A  proposition  q  strongly  implies  p  (denoted  q  =>  p)  if  and  only  if  p  is  true  in  any  world 
where  q  is.  Similarly,  p  is  logically  equivalent  to  q  (denoted  p  O  q)  if  and  only  if  p  and  q  are  true  in 
the  same  subset  of  worlds  of  U. 

Following  traditional  terminology,  we  will  say  also  that  a  proposition  p  is  satisRable  if  there  exists 
a  possible  world  p  such  that  w  h  p. 
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3  EXTENDED  MODALITIES 


We  turn  first  our  attention  to  the  problem  of  generalizing  modal  logic  formalisms  to  explain  the 
structures  and  functions  of  fuzzy  logic. 

A  number  of  authors  have  studied  various  relations  between  fuzzy  and  modal  logics.  Lakoff  [24], 
Murai  et  al.  [28],  and  Schocht[36]  have  proposed  graded  generalizations  of  basic  modal  constructs. 
Dubois  and  Prade  [13,14]  have  also  explored  analogies  between  these  nonstandard  logics.  In  a  recent 
paper  [12],  they  have  developed,  in  addition,  a  modal  basis  for  possibility  theory  by  means  of  the 
introduction  of  fuzzy  structures  into  modal  frameworks  with  the  goal  of  deriving  proof  mechanisms 
that  may  be  used  in  possibilistic  reasoning. 

The  goal  for  the  model  presented  in  this  note  is  somewhat  different  from  the  objectives  guiding 
those  efforts.  We  will  seek  explanations  for  possibilistic  constructs  on  the  basis  of  previously  existing 
notions  rather  than  generalizations  of  modal  frameworks  by  means  of  fuzzy  constructs.  The  model 
presented  here  is  not  based  on  the  use  of  graded  notions  of  possibility  and  necessity  as  primitive 
— and,  by  implication,  easy  to  understand — structures.  The  foundation  for  this  model  is  provided 
by  a  generalization  of  the  accessibility  relation,  which  is  given  a  simple  interpretation  as  a  measure 
of  resemblance  and  proximity  between  possible  worlds. 

We  will  extend  the  notion  of  accessibility  relation  to  encompass  a  family  of  nonempty  binary 
relations  Ra  that  are  indexed  by  a  numerical  parameter  a  between  0  and  1.  These  relations,  which 
are  nested,  i.e., 

Ra  C  Rfi,  whenever  f3  <  a , 

are  introduced  to  represent  different  degrees  of  similarity,  using  a  scheme  that  is  akin  to  that  used 
by  Lewis  in  his  study  of  counterfactuals[25].  The  family  of  accessibility  relations  introduced  here 
differs  from  that  proposed  by  Lewis,  however,  in  its  use  of  numerical  indexes4  and  in  the  nature 
of  the  overall  modeling  goals  that,  in  Lewis’  formalism,  are  intended  to  represent  changes  of  scale 
induced  by  consideration  of  different  restrictive  statements. 

3.1  Similarity  Relations 

To  facilitate  the  definition  of  a  family  of  accessibility  relations  we  introduce  a  similarity  function 

S:WxWh^[ 0,1], 

assigning  to  each  pair  of  possible  worlds  (w,  w')  a  unique  degree  of  similarity  between  0  (correspond¬ 
ing  to  maximum  dissimilarity)  to  1  (corresponding  to  maximum  similarity). 

With  the  help  of  this  function,  we  will  then  say  that  w  and  v/  are  related  to  the  degree  a, 
denoted  Ra(w,w'),  if  and  only  if  S(w,w')  >  a.  In  this  way,  the  relations  Ra  have  the  required 
nesting  property  with  Re  corresponding  to  the  whole  Cartesian  product  U  x  U  (or,  every  possible 
world  is  at  least  similar  in  a  degree  zero  to  every  other  possible  world). 

4  We  will  later  see  that  similarities  may  be  measured  using  more  general,  nonnumeric,  scales.  For  simplicity  reasons, 
we  will  avoid  at  this  point  the  introduction  of  more  general  schemes  that  unnecessarily  complicate  the  exposition. 
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Some  properties  are  required  to  assure  that  the  function  S  has  the  required  semantics  of  & 
metric  relationship  capturing  the  intuitive  notion  of  similarity  or  “proximity.”  It  is  first  necessary 
to  demand  that  the  degree  of  similarity  between  any  world  and  itself  be  as  high  as  possible,  i.e., 

S(w,  w)  =  l,  for  all  w  in  U . 

This  property  assures  that  every  one  of  the  accessibility  relations  Ra  will  be  reflexive  and,  following 
the  nomenclature  introduced  by  Zadeh  for  fuzzy  relations  [52],  we  will  also  say  that  the  similarity 
relation  is  reflexive. 

Next,  we  will  call  for  the  function  5  to  be  symmetric,  i.e., 

S(w,  tv')  =  S(w\  u>),  for  any  worlds  u;  and  w1  in  U . 

This  is  a  very  natural  requirement  of  any  relation  intended  to  represent  a  relation  of  resemblance 
between  objects. 

Finally,  and  most  importantly,  we  will  impose  a  form  of  transitivity  requirement  upon  the  simi¬ 
larity  function  5  that  turns  it  into  a  generalized  equivalence  relation.  The  purpose  of  this  restriction 
is  to  assure  that  S  has  a  reasonable  behavior  as  a  metric  in  the  universe  of  possible  worlds.  It  would 
certainly  be  surprising  if,  for  some  similarity  5,  we  were  to  be  told  that  w  and  u/  are  very  similar 
and  that  w '  and  w"  are  also  very  similar,  but  that  w  does  not  resemble  w"  at  all.  Clearly,  there 
should  be  a  lower  bound  on  the  possible  values  of  S(u>,  w")  that  may  be  expressed  as  a  function  of 
the  values  of  S(w,  w1)  and  S(w',w").  We  will  express  such  a  constraint  using  a  numeric  operation, 
denoted  ®,  that  takes  as  arguments  two  real  numbers  between  0  and  1  and  that  returns  another 
number  in  the  same  range,  i.e., 

®:  [0, 1]  X  [0, 1]  i — *  [0, 1] , 

in  the  form  of  the  inequality 


S(w,  w")  >  S{w,  w1)  ®  S(w’,  to") , 

assumed  valid  for  any  worlds  to,  to'  and  to"  in  the  universe#.  Recurring  again  to  a  modal  terminology, 
the  above  transitivity  constraint,  which  will  be  called  ^-transitivity,  may  be  rewritten  in  relational 
form  as 

Ra  ®  p  C  Ra  °Rp ,  for  all  0  <  a,  ()  <  1 , 

making  obvious  its  generalisation  of  the  conventional  definition  of  transitivity  for  ordinary  binary 
relations,  i.e., 

RCRoR. 

Since  the  role  of  ®,  through  recursive  application,  is  that  of  providing  a  lower  bound  for  the 
similarity  between  the  two  end  members  wi  and  wn  of  a  chain  of  possible  worlds  [tui,u>2,...,ti>„] , 
it  is  obvious  that  the  operation  ®  should  be  commutative  and  associative.  Furthermore,  it  should 
also  be  nondecreasing  in  each  argument,  as  it  is  reasonable  to  ask  that  the  desired  lower  bound  be 
a  monotonic  function  of  its  arguments.  Finally,  it  is  also  desirable  to  ask  that 

a  ®  1  =  1  ®  a  =  a , 

i.e.,  that  the  values  of  the  similarities  of  two  indistinguishable  objects  to  a  third  should  be  the  same. 
These  requirements  are  equivalent  to  demanding  that  the  operation  ®  be  a  triangular  norm  [37], 
orT-n orm,  for  short. 
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Triangular  norms,  originally  introduced  in  the  theory  of  probabilistic  metric  spaces  to  treat 
certain  statistical  problems,  play  a  distinguished  role  in  [0,  l]-multivalued  logics  [1,11,17,31]  as  the 
result  of  imposing  reasonable  requirements  upon  operations  that  produce  the  truth  value  of  the 
conjunction  of  two  expressions  as  a  function  of  the  truth  values  of  the  coryuncts.  Furthermore, 
generalized  similarity  relations  (called  B-R  relations  by  Zadeh  [54])  also  have  an  important  function, 
to  be  examined  further  later  in  this  note,  in  the  generalization  of  the  inferential  rule  of  modus 
ponens  [43,10].  Our  axiomatic  derivation  for  the  requirement  that  ®  be  a  T-norm  is  based,  however, 
solely  on  metric  considerations,  applied  here  to  a  space  of  possible  worlds,  but  is  valid  in  general 
metric  spaces. 

From  the  axioms  of  triangular  norms,  it  is  easy  to  see  that 

a®/?  <  min(a,/?) , 

showing  that  the  minimum  function,  itself  a  T-norm,  is  the  largest  element  in  this  class  of  operations. 
Its  minimal  element,  on  the  other  hand,  is  the  noncontinuous  function  ®  defined  by 

fa,  if  0  =  1 , 
a®/?=  l  0,  if  a  =  1 , 

(  0 ,  otherwise. 

Every  symmetric  and  reflexive  relation  is  ^-transitive  for  this  triangular  norm,  which  is,  therefore, 
of  little  practical  utility. 

In  what  follows,  we  will  also  impose  a  most  reasonable  additional  assumption  of  continuity  of 
®  with  respect  to  its  arguments  (i.e.,  why  should  there  be  a  jump  in  the  value  of  a  lower  bound 
provided  by  ®  when  the  values  of  its  arguments  are  slightly  changed?).  The  class  of  continuous 
T-norms  does  not  have  a  minimal  element,  although  under  certain  additional  assumptions  (requiring 
T-norms  to  be  also  J-copulas  [37]),  the  inequality 

max(a  +  0  -  1,0)  <  a®0 

also  holds  true,  showing  that  certain  important  continuous  T-norms  lie  between  that  of  the  Kj-logic 
of  Lukasiewicz  [17]  and  that  of  the  original  fuzzy  logic  proposed  by  Zadeh  [53]. 

Continuous  triangular  norms  play  a  significant  part  in  the  theories  of  pattern  recognition  and 
automatic  classification.  The  author  [33]  proposed  the  use  of  generalized  similarity  relations  based 
on  the  T-norm  of  Lukasiewicz  to  generalize  existing  classification  techniques — based  on  the  mapping 
of  a  similarity  function  into  a  conventional  equivalence  relation— to  the  fuzzy  domain — by  mapping 
these  T-norms  (called  likeness  relations  by  Ruspini)  into  generalized  fuzzy  partitions.  Bezdek  and 
Harris  [3]  independently  studied  axiomatic  approaches  to  cluster  analysis  based  on  the  use  of  several 
continuous  T-norms. 

The  author  has  also  studied  [34]  the  possible  relation  between  the  multivalued  logic  and  similarity 
related  aspects  of  T-norms,  and  suggested  that  the  degrees  of  similarity  between  two  objects  A  and 
B  may  be  regarded  as  the  “degree  of  truth”  of  the  vague  proposition 

“A  is  similar  to  B .” 

Having  argued  that  S  should  have  the  structure  of  a  generalized  equivalence  relation,  we  will 
assume,  mainly  for  reasons  of  simplicity,  that  the  function  5  is  the  dual  of  a  “true”  distance,  i.e., 
that 

S(w,  w')  =  1  if  and  only  if  w  =  w' . 
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This  restriction,  which  is  not  substantial,  is  introduced  primarily  to  assure  that  different  possible 
worlds  may  be  distinguished  by  means  of  the  function  5.  Otherwise,  the  equivalence  relation  that 
relates  two  worlds  w  and  tv'  if  and  only  if  S(w,  w')  =  1  may  be  used  to  partition  our  universe  U  into 
“indistinguishable”  nonintersecting  classes — indicating  that  our  metric  cannot  discriminate  between 
significant  differences  in  system  state. 

Before  closing  our  presentation  of  generalised  similarity  relations,  it  is  important  to  remark  upon 
the  close  relation  between  the  notion  of  similarity  and  that  of  distance.  If  a  function  6  is  defined  in 
terms  of  a  similarity  function  5  by  the  simple  relation 

6=1-5, 

then  it  is  easy  to  see  that  the  function  6  has  the  properties  of  a  metric  or  distance.  This  is  evident 
if  the  operation  $  corresponds  to  the  T-norm  of  Lukasiewicz,  since  the  transitivity  condition  is 
equivalent  to  the  well-known  triangular  inequality,  i.e., 

S(w,  tv")  <  S( tv,  tv')  +  6(w',w") . 

If  other  T-norms  are  used,  even  stronger  inequalities  hold,  with  the  so-called  “ultrametric  inequality” 

S(w,  tv")  <  max  ( S( tv,  tv'),  6(tv',  tv ") ) 

being  valid  for  the  T-norm  of  Zadeh.  In  this  case,  each  of  the  relations  in  the  family  Rq  (known  in 
fuzzy  set  theory  as  the  a-cut8  of  the  similarity  5)  is  a  conventional  equivalence  relation.  This  fact 
was  exploited,  prior  to  the  introduction  of  fuzzy  set  theory  and  fuzzy  cluster  analysis,  by  a  variety 
of  clustering  procedures  of  the  “single-link”  type  [22,40]. 

3.2  Possible  and  Necessary  Similarity 

Our  semantic  formalization  needs  require  the  introduction  of  constructs  to  indicate  the  extent  by 
which  a  concept  exemplifies,  illustrates,  or  is  an  adequate  model  of  another  concept.  Our  interpre¬ 
tations  shall,  therefore,  be  oriented  toward  characterization  of  the  degree  by  which  a  concept  can 
be  said  to  be  a  good  example  of  another  concept  with  the  purpose  of  defining  vague  concepts  by 
means  of  measures  of  proximity  between  defined  and  defining  concepts.  In  our  treatment,  each  of 
the  multiple  “definiens”  will  be  a  conventional  proposition  corresponding  to  a  subset  of  possible 
worlds.  It  is  conceivable,  however,  that  new  vague  concepts  might  also  be  described  by  indicating 
their  metric  relations  to  other  vague  concepts. 

The  required  constructs  are  based  on  the  idea  that  whenever  p  and  q  are  propositions  such  that 
p=>  q,  then  any  p-world  is  an  “example”  of  a  q- world.  This  basic  notion  will  be  generalized  by  the 
introduction  of  modal  structures  that  define  to  what  degree  possible  worlds  that  satisfy  a  certain 
proposition  q  fit  a  vague  concept.  Some  of  those  possible  worlds  are  “paradigmatic”  of  the  vague 
concept,  i.e.,  they  fit  it  to  a  degree  equal  to  1  in  the  same  sense  that  we  may  say,  for  example,  in  an 
absolute  (i.e.,  nongraded)  sense  that  somebody  whose  height  is  7  ft  is  definitely  “tall.”  If  we  use  a 
notion  of  graded  fitness,  however,  certain  worlds  will  fit  the  concept  to  a  degree,  i.e.,  they  resemble 
(or  are  similar)  to  some  paradigmatic  example  of  the  vague  concept. 

The  conventional  interpretation  of  possibility  needs  to  be  modified,  therefore,  to  capture  the  idea 
that  a  particular  possible  world  is  similar  in  some  degree  to  another  world  that  satisfies  a  “reference” 
proposition. 

5  The  a-cut  of  a  fuzzy  »et  /j:  U  i-*  [0,1]  is  the  conventional  set  of  all  points  w  such  that  n(w)  >  a.  A  similar 
concept  is  defined  for  relations  as  subsets  of  a  product  space  U  X  V. 


9 


More  generally,  however,  we  will  be  interested  in  relations  of  similarity  between  pairs  of  subsets  of 
possible  worlds  rather  than  between  pairs  of  possible  worlds.  This  requirement  complicates  matters 
considerably  since  we  will  be  forced  to  consider  both  the  “validity”  of  a  proposition  p  in  some  world 
where  another  proposition  q  is  true,  as  well  as  its  applicability  in  every  world  where  q  is  true.  In 
the  former  case,  we  will  care  about  the  existence  of  ^-worlds  that  are  similar  to  some  degree  to  some 
p-world,  while  in  the  latter  we  will  be  concerned  with  the  size  of  the  minimum  neighborhood  of  p 
(as  a  subset  of  the  universe  U )  that  fully  encloses  the  subset  q. 

This  dual  concern  for  what  may  possibly  apply  and  what  must  necessarily  hold— an  essential 
aspect  of  modal  logic — is  typical  of  situations  where  relationships  between  ensembles  of  objects  are 
described  in  terms  of  relations  between  their  members.  In  the  probability  calculus,  for  example, 
knowledge  of  probabilities  over  certain  families  of  subsets  provides  “sharp”  upper  and  lower  bounds 
(called  inner  and  upper  probabilities,  respectively)  for  the  probabilities  of  other  subsets  an  impor¬ 
tant  fact  in  the  extension  of  set  measures  to  larger  domains  [19].  The  role  and  properties  of  these 
bounds  in  the  Dempster-Shafer  calculus  of  evidence  is  well-known,  having  been  described  in  the 
original  paper  of  Dempster  [8],  related  to  concepts  of  modal  logic  by  Ruspini  [35],  and  being  also  the 
subjects  of  considerable  formal  study  [7]  as  mathematical  structures. 

Analogies  between  the  role  of  probabilistic  bounds  (i.e.,  bounds  for  probability  values)  and  pos¬ 
sibility/necessity  distributions— shown  below  to  have  play  a  similar  part  with  respect  to  metric 
structures — have  been  the  source  of  much  of  the  confusion  about  the  need  for  possibilistic  schemes. 
Each  upper/lower-bound  pair,  however,  leads  to  a  substantially  description  of  the  nature  of  a  subset 
of  possible  worlds,  being,  in  either  case,  measures  that  arise  naturally  when  pointwise  properties  are 
extended  to  set  partitions.  General  properties  of  these  measures  have  been  studied  by  Dubois  and 
Pradefll]  in  the  context  of  approximate  reasoning  and  in  other  regards  by  Pavlak[30]. 

Our  generalizations  of  the  notions  of  possibility  and  necessity  are  related  to  the  so-called  de  re  [21] 
interpretation  of  the  statement  “If  q,  then  p  is  possible”  as  the  modal  propositional  relation 

q  =>  Up. 

We  will  say  that  the  proposition  q  implies,  or  is  a  necessary  model  of,  the  proposition  p  to  the 
degree  a  if  and  only  for  every  g-world  w  there  exists  a  p-world  w'  that  is  at  least  a-similar  to  it, 
(i.e.,  S(w,  w')  >  a),  or  equivalently,  whenever 

q  =$>  nop. 

Similarly,  we  will  say  that  the  proposition  q  is  consistent  with,  or  is  a  possible  model  of,  the 
proposition  p  to  the  degree  a  6  if  and  only  there  exist  a  g-world  w  and  a  p-world  w'  that  are  at  least 
a-similar,  or  equivalently,  whenever 

-i(p  =>  -’ll „?)  . 

The  similarity  function  that  we  have  introduced  in  the  universe  U  provides  us  with  a  simple 
mechanism  to  quantify  both  the  extent  of  “inclusion”  and  that  of  the  “intersection”  between  pairs 
of  subsets  of  possible  worlds.7 

6  Note  that  our  characterization*  of  both  possibility  and  necessity  distributions  are  based  in  the  modal  possibility 
operators  na. 

7  For  reasons  that  by  now  should  be  evident,  we  will  not  need  to  introduce  a  concept  of  “unconditioned  possibility 
although  it  would  be  easy  to  do  so  using  9  =  U.  Being  concerned  with  the  power  of  certain  propositions  to  exemplify 
other  conditions,  we  will  not  have  much  occassion  to  deal  with  the  strength  of  tautologies  in  that  regard. 
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33  Possibilistic  Implication  and  Consistence 

The  notion  of  subset  inclusion  and  its  related  concept  of  set  identity  are  of  central  importance  in 
deductive  logic,  since  subsets  of  possible  worlds  are  formally  equivalent  to  propositions  with  subset 
inclusion  and  identity  corresponding  to  logical  implication  and  equivalence,  respectively.  These 
propositional  relationships  are  the  basis  of  derivation  rules  such  as  the  modus  ponens.  The  notion 
of  intersection  plays  a  similar  role  in  modal  analyses  because  of  its  ability  to  express  the  potential 
validity  of  a  statement. 

Classical  accounts,  however,  recognize  only  two  “degrees”  of  inclusion  corresponding  to  the  cases 
when  either  a  set  q  is  a  subset  of  another  set  p  or  it  is  not,  with  a  similar  dichotomy  applying  to 
degrees  of  intersection.  Our  generalisation  exploits  the  metric  structures  defined  between  sets  of  pos¬ 
sible  worlds  by  introducing  measures  that  describe  a  subset  as  enclosed  in  a  neighborhood  (of  some 
size)  of  another  set  while  intersecting  another  of  its  neighborhoods  (of  “smaller”  size).*  The  problem 
of  measuring  the  “size”  of  those  neighborhoods  is  the  subject  of  our  immediate  considerations. 

33.1  Degree  of  Implication 

Our  definition  of  partial  implication  between  propositions  was  based  on  conditions  that  determine 
whether,  given  two  propositions  p  and  q,  one  of  them  implies  the  other  to  the  some  value  o.  In 
particular,  since  every  world  w  is  always  similar  in  a  degree  that  is  at  least  equal  to  zero  to  any 
other  world  w',  it  is  always  true  that  any  proposition  q  implies  any  other  proposition  p  to  the  degree 
zero.  It  is  often  the  case,  however,  that  the  degree  of  implication  between  p  and  q  is  at  least  equal 
to  some  certain  positive  value  o. 

If  we  want  to  generalize  procedures  based  on  inclusion  relationships,  such  as  the  modus  ponens, 
in  an  efficient  fashion,  we  will  need  measure  the  “optimal”  (or  maximum)  value  of  the  parameter  a 
such  that  q  implies  p  to  the  degree  a.  This  value  is  a  measure  of  the  degree  by  which  the  set  of  all 
p-worlds  must  be  “stretched”  to  encompass  the  set  of  all  {-worlds.  The  least  upper  bound  of  the 
values  of  the  similarities  between  any  {-world  u/  and  some  p-world  w  (depending,  in  general,  from 
w')  is  given  by  the  degree  of  implication  function: 

Definition:  The  degree  of  implication  of  p  by  {  is  the  value 

I(p !  9)  =  inf  sup  S(w,w'). 

whp 


Defined  in  this  way,  the  degree  of  implication  I(p|{)  is  a  measure  of  the  “minimal  amount”  of 
stretching  required  to  reach  a  p-world  from  any  {-world,  in  the  sense  that  if  fi  <  I(p  |  {),  then 

q=>Upp. 

Furthermore,  a  is  the  largest  real  value  for  which  the  above  statement  may  be  made. 

As  the  following  theorem  makes  clearer,  this  function  provides  the  bases  for  the  generalization 
of  the  modus  ponens.  This  truth-derivation  procedure  may  be  thought  of  as  an  expression  of  the 
nesting  relationships  that  hold  between  the  sizes  of  neighborhoods  of  such  subsets. 

•it  U  important  to  recall  that,  due  to  our  reliance  on  similarity  rather  than  on  the  dual  notion  of  dissimilarity  or 
distance,  high  values  of  o  correspond  to  low  values  of  “stretching”  or  to  smaller  set  neighborhoods. 
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Theorem:  The  degree  of  implication  function, 

I:  ifx  if  ~  [0,1], 

has  the  following  properties: 

(i)  If  p  =>  r,  then  I(p  | q)  <  I(r  |  q) 

(ii)  If  q  =*•  r,  then  I(p  |  q)  >  I (p  |  r) 

(iii)  I(P|«)  >  I(P  I  r)®I(r  |  q) 

where  p,q  and  r  are  any  satisfiable  propositions. 

Proof:  The  first  two  properties  are  an  immediate  consequence  of  the  definition  of  degree  of  impli¬ 
cation.  To  prove  the  third,  observe  that  by  definition  of  similarity 

S(to,  to')  >  S(to,to")®S(to",to') 

for  any  worlds  to,  to',  and  to". 

Taking  the  supremum  on  both  sides  of  this  inequality  with  respect  to  all  worlds  to  h  p,  it  follows, 
because  ®  is  continuous,  that 

sup  S( to, id')  >  [sup  S(to, to") ]  ®  5(to",to') . 

whp 

Since  this  expression  is  true,  in  particular,  for  all  worlds  to"  h  r,  it  is  true  that 
sup  S(w,w')  >  [inf  sup  S(to,  to")  ]  ®  S(w,  to') 

whp  w"hr  whp 

=  I(p|r)®S(u>,u/), 

where  w  is  any  world  such  that  to  h  r. 

From  this  inequality,  it  follows,  since  ®  is  continuous,  that 

sup  S(to,  to')  >  I(p  |  r)  ®  [sup  S(to,  to')  ] . 

whp  ihr 

Taking  now  the  infimum  on  both  sides  of  this  expression  over  all  worlds  u/  such  that  to'  I-  q,  it  is 
easy  to  see,  using  again  the  continuity  of  ®,  that 

inf  sup  S(to,  to')  >  I(p  j  r)  ®  [  inf  sup  S(to,  to')  ] , 

w't-f  whp  H/'hf  tfHr 

proving  the  ®-transitivity  of  1.  I 

Note,  that  since  I(q  |  q)  =  I  for  any  proposition  q,  the  following  statement  is  also  true: 
Corollary.  If  p  and  q  are  propositions  in  if,  then 

I(p|?)  =  sup  [l(p|r)®I(r|9)]  . 
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Notice  also  that  if  I(p  |  9)  =  1,  then 


•up  S(w,  ui')  =  1 ,  for  ail  w'  h  q . 

wbp 

Under  minimal  assumptions  (assuring  that  the  supremum  operation  is  actually  a  maximization), 
this  relation  is  equivalent  to  stating  that  q  strongly  implies  p,  or  that  any  9-world  is  also  a  p-world. 

The  nonsymmetric  function  I  measures  the  extent  by  which  every  world  w'  in  a  certain  class 
resembles  some  world  w  (dependent  of  u>')  in  a  reference  class,  possibly  explicating  the  nature  of 
the  nonsymmetric  assessments  [45]  found  in  psychological  experimentation  when  subjects  are  asked 
to  evaluate  the  degree  by  which  an  object  “resembles”  another.  The  results  obtained  in  those 
experiments  suggest  that  human  beings,  when  assessing  similarity  between  objects,  use  one  of  them 
(or  a  class  of  similar  objects)  as  a  reference  landmark  to  describe  the  other.  Such  assymmetries  might 
be  explained  by  noticing  that,  in  general,  I(p  t  ?)  ^  1(9 1  p),  indicating  that  the  stronger  stimulus 
might  generally  be  used  to  construct  a  reference  class,  which  is  then  used  to  describe  other  stimuli. 

The  degree  of  implication  of  one  proposition  by  another  can  be  readily  used  to  generate  a  measure 
of  similarity  between  propositions  that  generalizes  our  original  measure  of  similarity  between  possible 
worlds: 

S(p,q)  =  min[l(p|9),I(9lp)] , 

quantifying  the  degree  by  which  the  propositions  p  and  9  are  equivalent. 

It  may  be  readily  proved  [44],  from  its  definition  and  from  the  transitivity  property  of  I  that  5  is 
a  reflexive,  symmetric,  and  ©-transitive  function  between  subsets  of  possible  worlds.  This  similarity 
function  is  the  dual  of  the  well-known  Hausdorff  distance ,  defined  between  subsets  of  a  metric  as  a 
function  of  the  distance  between  pairs  of  their  members  [9],  which  is  given  by  the  expression 

6(A,  3)  =  max  (  sup  inf  6(z,y)),  (  sup  inf  6(x,  y) ) 

L  *€A  HtB  tiB  »6A 

The  result  expressed  by  the  transitive  property  of  the  degree  of  implication  may  be  stated  using 
modal  notation  in  the  form 

9  =>  nor  and  r=>IIp9  imply  that  9=>n<»epPi 

as  the  simplest  form  of  the  generalized  modus  ponens  rule  of  Zadeh. 

The  relationship  between  this  rule  and  the  classical  modus  ponens  is  easier  to  perceive  if  it  is 
remembered  that  classical  conditional  propositions  of  the  form  “If  9,  then  p,”  simply  state  that  the 
set  of  9-worlds  is  a  subset  of  the  set  of  p- worlds.  Such  relationships  of  inclusion  may  also  be  described 
in  metric  terms  by  saying  that  every  9-world  has  a  p-world  (i.e.,  itself)  that  is  as  similar  as  possible 
to  it. 

Logic  structures,  however,  only  allow  us  to  say  that  either  9  implies  p  or  that  9  implies  its  negation 
->p,  or  that  neither  of  those  statements  is  true.  By  contrast,  similarity  relations  allow  measurement 
of  the  amount  by  which  a  set  must  be  “stretched”  (as  illustrated  in  Figure  1)  to  enclose  another 
set.  Using  such  metrics,  we  may  describe  the  generalized  modus  ponens  as  a  relation  between  the 
stretching  required  to  reach  p  from  any  point  of  the  set  r,  the  stretching  required  to  reach  r  from 
any  point  of  the  set  9,  and  the  stretching  required  to  reach  p  from  any  point  of  the  set  9. 

In  Section  5  we  will  derive  alternative  expressions  for  the  generalized  modus  ponens  that  allow 
to  propagate  both  measures  characterizing  degree  of  implication  and  degree  of  consistence;  a  dual 
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Figure  1:  The  Generalized  Modus  Ponens. 


concept  that  ^lays,  with  respect  to  the  notion  of  possibility,  the  function  that  is  fulfilled  by  the 
degree  of  implication  function  with  respect  to  necessity.  In  those  derivations,  by  introduction  of 
sharper  bounds  for  certain  conditional  concepts,  we  will  also  be  able  to  improve  the  quality  of  the 
bounds  provided  by  generalized  modus  ponens  rules  while  being  closer  in  spirit  to  its  usual  fuzzy-logic 
formulation. 

3.3.2  Degree  of  Consistence 

A  notion  that  is  dual  to  that  of  degree  of  implication  is  given  by  a  function  that  measures  the  point- 
wise  proximity  between  pairs  of  possible  worlds  from  an  "optimistic”  point  of  view  characterizing 
the  degree  by  which  statements  that  are  true  in  some  worlds  may  apply  on  others.  By  contrast,  the 
degree  of  implication  measures  the  extent  by  which  statements  that  are  true  in  p-worlds  must  hold 
in  ^-worlds. 

Definition:  The  degree  of  consistence  of  p  and  9  is  the  value 

C  (p  1 9)  =  sup  sup  S(u>,  w') . 


An  immediate  consequence  of  this  definition  that  C  (•  |  •)  is  a  symmetric  function  that  is  increas¬ 
ingly  monotonic  in  both  arguments  (with  respect  to  the  =»  ).  If  is  also  easy  to  see  that  the  values 
of  the  degree  of  consistence  function  are  never  smaller  than  the  corresponding  values  of  the  degree 
of  consistence  function, 

I(p|9)<C(p|9), 

as  the  amount  of  stretching  required  to  reach  p  from  some  "convenient”  9-world  is  smaller  (i.e., 
higher  values  of  S)  than  that  required  to  reach  p  from  any  9-world.  In  general,  however,  the  degree 
of  consistence  function  is  not  transitive,  preventing  the  statement  of  a  "compatibility”  counterpart  of 
the  generalized  modus  ponens  rule.  Its  relationship  with  the  degree  of  implication  function  expressed 
by  the  expression 

C(p|9)  =  sup  I(p|u)')  =  sup  I(g|u>) 

tti'Hf  t 

will  permit  us,  nonetheless,  to  derive  a  useful  bound-propagation  expression. 
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4  POSSIBILITY  AND  NECESSITY  DISTRIBUTIONS 


This  section  presents  interpretations  of  the  major  constructs  of  fuzzy  iogic  — possibility  and  necessity 
distributions — in  terms  of  similarity-based  structures.  Possibility  and  necessity  distributions  are 
functions  that  measure  the  proximity  of  either  all  or  some  of  the  worlds  in  the  evidential  set  to 
worlds  in  other  sets  that  are  employed  as  reference  landmarks. 

The  role  played  by  possibility  and  necessity  distributions  is  similar  to  that  performed  by  lower 
and  upper  bounds  of  probability  distributions  (or  by  the  belief  and  plausibility  functions  of  the 
Dempster-Shafer  calculus  of  evidence)  with  respect  to  probability  distributions.  The  essential  differ¬ 
ence  between  these  bounds  and  those  provided  by  possibility/necessity  pairs  lies  in  the  fundamentally 
dissimilar  character  of  what  is  being  bound — metric  structures  relating  pairs  of  worlds  in  one  case; 
measures  of  set  size,  on  the  other.  Furthermore,  in  the  model  of  possibilistic  structures  that  is 
presented  in  this  note  necessity  (possibility)  distributions  are  any  lower  (upper)  bounds  of  certain 
metric  functions  rather  than  its  “best”  or  “sharpest”  bounds.  The  operations  of  fuzzy  logic  allow 
computation  of  bounds  for  some  of  these  measures  as  &  function  of  bounds  of  other  measures. 

4.1  Inverse  of  a  Triangular  Norm 

When  working  in  ordinary  metric  spaces,  it  is  often  convenient  to  express  the  conventional  statement 
of  the  triangular  inequality,  i.e., 


6(w,  w1)  <  6(w,  w")  +  6{w",  w') , 


in  the  equivalent  form 

$(u>,u/)  >  |$(u>,tt>M)  -  6(w',w")  | , 

which  utilizes  a  form  of  inverse  (i.e.,  the  substraction  operator  -)  of  the  function  used  to  express 
the  original  inequality  (i.e.,  the  addition  operator  +)•  This  notion  of  inverse  may  be  directly  gener¬ 
alized  [37]  to  provide  us  with  the  tools  required  to  define  possibility  and  necessity  functions  and  to 
derive  useful  forms  of  the  generalized  modus  ponens  involving  either  type  of  these  constructs. 

Definition:  If  ®  is  a  triangular  norm,  its  pseudoinverse  0  is  the  function  defined  over  pairs  of 
numbers  in  the  unit  interval  of  the  real  line,  by  the  expression 

o06  =  sup{c:  6®c<a}. 

From  this  definition  it  is  clear  that  a<2)b  is  nondecreasing  in  a  and  nonincreasing  in  6.  Furthermore, 
a0O  =  1  and  a0 1  =  a  for  any  a  in  [0, 1].  Other  important  properties  of  the  pseudoinverse  function 
are  given  in  the  works  of  Schweizer  and  Sklar[37],  Trillas  and  Valverde  [43],  and  Valverde[44]. 

Examples  of  the  pseudoinverses  of  important  triangular  norms  are  given  in  Table  1  together  with 
the  corresponding  conorms. 
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Table  1:  Triangular  Norms,  Conorms,  and  Pseudoinverses 


Name 

T-Norm  a®b 

Conorm  a  ®  b 

Pseudoinverse  a  0  b 

Lukasiewicz 

max  (a  +  b  -  1,0) 

min  (a  +  6,1) 

min(l  +  a  —  6, 1) 

Product 

ab 

a  +  b  —  ab 

a/b ,  if  6  >  a 

1 ,  otherwise 

Zadeh 

min  (a,  b) 

max  (a,  6) 

a ,  if  6  >  a 

1 ,  otherwise 

4 2  Unconditioned  Necessity  Distributions 

We  introduce  first  a  family  of  functions  that  bound  by  below  the  value  of  the  similarity  between 
any  evidential  world  in  if  to  some  world  where  another  proposition  p  is  true.  These  unconditioned 
necessity  distributions  are  lower  bounds  for  values  of  the  degree  of  implication  I(p|£T),  which 
measures  the  extent  by  which  statements  that  are  true  in  a  reference  set  (i.e.,  the  subset  of  p-worlds) 
must  hold  in  the  evidential  set. 

As  observed  before,  whenever  I(p|Jf)  =  1,  it  is  true,  under  minimal  assumptions,  that  the 
evidential  subset  if  is  a  subset  of  the  set  of  all  p-worlds,  or  that  p  necessarily  holds  in  if.  If,  on 
the  other  hand,  I(p|  if)  =  a  <  1,  then  p  must  be  stretched  a  certain  amount — with  smaller  a 
corresponding  to  larger  stretching — in  order  for  one  of  its  neighborhoods  to  encompass  if. 

Definition:  If  if  is  an  evidential  set,  then  a  a  function  Nec(-)  defined  over  propositions  in  the 
language  if  is  called  an  unconditioned  necessity  distribution  for  if  if 

Nec(p)  <  I(p  |  if) . 

4.3  Unconditioned  Possibility  Distributions 

The  dual  counterpart  of  the  unconditioned  necessity  distribution  is  provided  by  upper  bounds  of 
the  degree  of  consistence  C(p|  if).  Whenever  C(p|  if)  —  1,  it  is  easy  to  see  that,  under  minimal 
assumptions,  there  exists  a  p-world  w  that  is  in  the  evidential  set  if  or,  equivalently,  that  p  (for  all 
we  know)  is  possibly  true.  If,  on  the  other  hand,  C(p  |  if)  =  a  <  1,  then  there  exists  a  neighborhood 
(of  “size”  a)  of  some  p-world  that  intersects  the  evidential  set. 

Definition:  If  if  is  an  evidential  set,  then  a  function  Poss(-)  defined  over  propositions  in  the 
language  if  is  called  an  un conditioned  possibility  distribution  for  if  if 

Poss(p)  >  C(p\if). 

Since  the  value  Poss(p)  of  any  possibility  function  Poss(*)  is  an  upper  bound  of  the  value 
C(p|ST)  of  the  degree  of  consistence,  while  the  corresponding  value  Nec  (p)  of  any  necessity  function 
Nec(-)  is  a  lower  bound  of  I(p  If),  it  follows  that  values  of  a  possibility  function  can  never  be  smaller 
than  the  corresponding  values  of  any  necessity  function,  i.e.,  that 

Nec(p)  <  Poss(p) . 


16 


4.4  Properties  of  Possibility  and  Necessity  Distributions 

In  this  subsection  we  will  develop  similarity-based  interpretations  for  some  basic  formulae  of  possi- 
bilistic  calculus.  These  expressions  may  be  thought  of  as  mechanisms  that  allow  the  extension  of  a 
partially  known  possibility  distribution.  For  example,  the  property  that 

max(  Poss  (p),  Poss  (g) )  >  C (p  V  q  \  if ) , 

which  is  proved  below,  is  the  similarity  interpretation  of  the  standard  rule  that  allows  computation 
of  the  value  of  the  possibility  value  of  a  disjunction  in  fuzzy  logic,  i.e., 

Poss (p  Vg)  =  max(Poss(p),  Poss(g)) . 


Theorem:  If  p  and  q  are  propositions,  and  if  the  quantities  Poss(p),  Poss(g),  Nec(p),  and  Nec(g) 
are  such  that 


Nec(p)  <  I(p|Jf),  Nec(g)  <  I(g|fr), 

Poss (p)  >  C(p|fr),  Poss(g)  >  C(g|ff), 

then  the  following  statements  (similarity-based  interpretations  of  the  basic  laws  of  fuzzy  logic)  are 
valid: 


max(Nec(p),  Nec(g))  <  I(pVg|JT), 
max(  Poss (p),  Poss(g) )  >  C(p  V  q  \  if) , 
min ( Poss (p),  Poss (g))  >  C(pAg|8’). 

Proof:  Note  first  that  since  C  (•  |  •)  is  nondecreasing  (with  respect  to  the  =>  order)  in  its  argu¬ 
ments,  it  is  true  that 

Poss (p)  >  C(p|«T)  >  C(p  A  g  |  JT) , 

Poss(g)  >  C(gjJf)  >  C(pAg|^), 

whenever  p  A  g  is  satisfiable,  from  which  it  is  easy  to  see  that 

min(Poss(p),  Poss(g))  >  C(p  Ag  j  if) , 

The  corresponding  result  is  obvious  when  p  A  g  is  nonsatisfiable. 

A  similar  argument  shows,  for  necessity  functions,  that 

max(Nec(p),  Nec(g))  <  I(p  Vg  |  if) . 

To  prove  the  disjunctive  law  for  possibilities,  notice  that  if  /  is  any  function  mapping  elements 
of  a  general  domain  D  into  real  numbers,  then 

sup  {  /(d) :  d  €  A  U  B  }  =  max  [ sup  {  /(d) :  d  €  A  },sup  {  /(d) :  d  6  B  }  1  . 
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From  this  equalitv,  it  is  easy  to  see  that  if  Poss (p)  and  Poss(?)  are  upper  bounds  of  I(p  |  if) 
and  I({  |  if),  respec  .*,ely,  then 

max(  Poss(p),  Poss  (9) )  >  C(p  V  q  \  if) , 

completing  the  proof  of  the  theorem.  I 

Note,  however,  that  another  law  commonly  given  as  an  axiom  for  necessity  functions  does  not  hold 
valid  in  our  interpretation.  As  illustrated  in  Figure  2,  the  distance  from  a  point  to  the  intersection 
of  two  sets  may  be  strictly  larger  than  the  distances  to  either  set  (i.e.,  the  similarity  will  be  strictly 
smaller).  In  general,  therefore,  it  is 

min(Nec(p),  Nec(«))  £I(pAj|jr), 

making  invalid,  under  this  interpretation,  the  conjunctive  law  for  necessities  [11] 

Nec(pA$)  =min(Nec(p),  Nec($)) . 


Figure  2:  Failure  of  Conjunctive  Necessity. 

We  may  also  note  in  this  regard  that  the  similarity-based  model  that  is  discussed  here  does  not 
make  use  of  the  notion  of  negation  either  as  a  mechanism  to  generate  dual  concepts  or  on  its  own 
right  as  an  important  logical  concept.  It  is  the  intent  of  the  author  to  study,  in  the  immediate  future, 
alternative  models  where  notions  of  negation  and  maximal  dissimilarity  play  more  substantive  roles. 

4.5  Conditional  Possibilities  and  Necessities 

The  concepts  of  conditional  possibility  and  necessity  are  closely  related  to  the  previously  introduced 
unconditioned  structures.  These  structures  may  be  thought  of  as  a  characterization  of  the  proximity 
of  a  world  w  to  some  or  ail  of  the  worlds  where  a  proposition  p  is  true,  given  that  w  is  similar  in 
the  degree  1  to  the  evidential  set  if  (i.e.  w  h  if).  With  this  fact,  in  mind,  we  could  have  used  the 
somewhat  baroque  formulation 

C(p  |  if)  —  sup  [l(p|u>)0l($f  |u/)] 

tohjf 
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to  define  unconditioned  possibility  distributions — a  rather  unnecessary  effort  if  we  consider  that 
I(&  |  w)  =  1  whenever  w  h  if,  showing  its  obvious  equivalence  to  the  simpler  form  used  in  Sec¬ 
tion  3.3.2  above.  In  spite  of  such  observation,  the  above  identity  is  important  in  understanding 
the  purpose  of  the  definitions  given  below.  Those  definitions  interpret  conditional  possibilities  and 
necessities  as  a  measure  of  the  proximity  of  worlds  on  the  evidential  set  if  to  (some  or  all)  worlds 
satisfying  a  (conditioned)  proposition  p  relative  to  their  proximity  to  (some  or  all)  the  worlds  that 
satisfy  another  (conditioning)  proposition  q. 

The  mechanism  used  to  specify  that  relationship,  which  is  closely  related  in  spirit  to  results  of 
Valverde  [44]  on  the  structure  of  indistinguishability  relations,  is  based  on  the  pseudoinverse  function 
introduced  in  Section  4.1.  The  basic  idea  used  by  these  definitions  is  also  illustrated  in  Figure  3, 
where,  from  the  perspective  of  the  evidential  world  w,  the  similarity  between  the  p-world  u  and  the 
9-world  v  is  estimated  by  means  of  an  inequality  that  generalizes  the  “absolute  value”  form  of  the 
triangular  inequality,  i.e., 

£(u,t>)  >  |5(u,tu)  -  £(v,tu)|, 

to  its  similarity-based  form 

S(u,  t))  <  min  [  S(u,  w )  0  S(v,  w),  S(v,  xv)  0  S(u,  w)  ]  . 


Figure  3:  Similarities  as  Viewed  from  the  Evidential  Set. 

The  required  interplay  between  similarities  to  conditioning  and  conditioned  sets  is  captured  by 
the  following  definitions. 

Definition:  Let  if  be  an  evidential  set.  A  function  Nec(-|-)  mapping  pairs  of  propositions  in  the 
language  if  into  [0, 1]  is  called  a  conditional  necessity  distribution  for  if  if 

Nec(g|p)  <  inf  [l(?  |  w)  0  I(p  |  tn)]  , 
wX-if 

for  any  propositions  p  and  q  in  if . 
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Definition:  Let  if  be  an  evidential  set.  A  function  Poaa(-|-)  mapping  paira  of  propoaitiona  in  the 
language  if  into  [0, 1]  ia  called  a  conditional  possibility  distribution  for  if  if 

Poaa(?|p)  >  sup  [i (q  |  w)  0 1(p |  tt>)] , 
whir 

for  any  propoaitiona  p  and  q  in  if. 

It  ia  eaay  to  aee,  from  theae  definitions,  that  the  values  of  a  conditional  neceaaity  distribution  are 
never  larger  than  the  corresponding  values  of  any  conditional  possibility  distribution,  i.e., 

Nec(f|p)  <  Posa(?|p) . 

Furthermore,  since  I(- 1  •)  is  0-transitive,  then 

I(9|u»)>I(?|p)®I(p|u;). 

From  this  inequality  and  the  definition  of  pseudoinverse  of  a  triangular  norm,  it  is  easy  to  see  that 
any  necessity  function  satisfies  the  inequality 

Nec(?|p)  >I(?|p), 

i.e.,  the  bounds  for  necessity  functions  provided  by  the  evidential-set  perspective  are  stronger  than 
those  that  can  be  obtained  by  direct  use  of  the  degree  of  implication  function.9 

Note  also  that  if  Nec(p)  =  1,  indicating  that  I(p  |  if)  =  1,  and  if  Nec(?|p)  =  1,  then  the  above 
definition  of  conditional  necessity  shows  that  I  (q  |  if)  =  1,  indicating  that  Nec($)  may  be  taken 
to  be  equal  to  1,  thus  generalizing  the  well-known  axiom  (consequential  closure)  of  certain  modal 
systems  (e.g.,  the  system  T,  as  discussed  in  Hughes  and  Creswell  [21]) 

If  Np  and  N(p  -♦  q),  then  Ng . 

The  definitions  above  can  also  be  further  interpreted  as  a  way  to  compare  the  similarities  between 
evidential  worlds  and  those  in  the  conditioning  and  conditioned  sets  by  noting  that  whenever 

for  every  evidential  world  u>  h  if,  then  Nec(q|p)  may  be  chosen  to  be  equal  to  1.  Similarly,  if 
there  exists  some  world  u;  I -if  where  this  inequality  holds,  then  it  is  Poss(?|p)  =  1.  In  either  case, 
however,  the  maximum  value  for  the  conditional  distribution  (i.e.,  1)  is  reached  when  the  proximity 
of  one  evidential  world  w — in  the  case  of  possibilities — or  of  every  one  of  them — in  the  case  of 
necessities — to  a  world  wt  in  the  conditioned  set  exceeds  the  proximity  of  w  to  the  conditioning  set 
p.  In  either  case,  once  again  recurring  to  an  apparent  notational  overkill,  we  may  state  this  fact  by 
means  of  the  identity  function  r  in  the  unit  interval: 

r:  [0, 1]  ►  (0, 1) :  cr  ►  or , 

in  the  form 

I(?  I  u>)  >  r(l(p|  u>))  , 

9  A  dual  inequality  for  possibilities  involving  C  (q  |  p)  does  not  hold  in  general.  It  it  eaay  to  ice,  however,  that 
C(q  |  if)  0I(p  |  £*)  is  a  possibility  function  for  q  given  p. 
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for  some  w  1-  &  in  the  case  of  possibilities,  with  the  same  inequality  holding  for  every  w  I-  if  in  the 
case  of  necessities.  We  may,  however,  conceive  of  other  functions 


7:  [0, 1]  h-*  (0, 1]:  a  t-  7(a) , 

with  7(a)  >  a  to  specify  a  stronger  form  of  implication,  as  illustrated  in  Figure  4,  i.e., 

I(«M>7(I(pI«0). 

Similarly,  one  may  also  conceive  of  functions  if>  with  ii>(a)  <  a  that  may  be  used  to  model  weaker 
forms  of  implication. 


Figure  4:  Examples  of  Possible  Similarity  Relationships  between  Conditioning  and  Conditioned  Sets. 

Possibilistic  calculi  based  on  the  propagation  of  truth-mappings  of  this  type,  first  proposed  by 
Baldwin  [2],  are  utilized  in  the  RUM  [4,5]  and  MILORD  [18]  expert  systems.  The  particular  case 
when  7  =  r,  stating  that  every  a-cut  of  the  conditioning  proposition  p  is  fully  enclosed  (in  the 
conventional  sense)  in  the  a-cut  of  the  conditioned  proposition  q,  has  been  called  the  truth  mapping 
in  the  fuzzy  logic  literature. 

The  primary  purpose  of  conditional  distributions,  however,  is  to  provide  a  quantitative  measure 
of  the  strength  by  which  one  proposition  may  be  said  to  imply  another  with  a  view  to  extend 
inferential  procedures  by  means  of  structures  that  superimpose  the  topological  notion  of  continuity 
upon  a  logical  framework  concerned  with  propositional  validity. 
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5  GENERALIZED  INFERENCE 


The  major  inferential  tool  of  fuzzy  logic  is  the  compositional  rule  of  inference  of  Zadeh  [53],  which 
generalizes  the  corresponding  classical  rule  of  inference  by  its  ability  to  infer  valid  statements  even 
when  a  perfect  match  between  facts  and  rule  antecedent  does  not  exist,  i.e., 

p  P' 

from  p  — i ►  q  to  its  “approximate”  version  p  — >  9  i 


where  p'  and  q'  are  similar  to  p  and  q,  respectively.  In  this  sense,  the  generalized  modus  ponens 
operates  as  an  “interpolation”  (or,  more  precisely,  as  an  “extrapolation”)  procedure  in  possible-world 
space. 

Unlike  the  interpolation  procedures  of  numerical  analysis,  however,  which  yield  estimates  of 
function  value,  this  extrapolation  procedure  approximates  truth  in  the  sense  that  it  produces  a 
proposition  that  is  both  more  general  than  the  consequent  of  the  inferential  rule  and  resembles  it 
to  some  degree  (which  is  a  function  of  the  degree  by  which  p'  resembles  p).  The  “extrapolated 
conclusion,”  however,  is  a  correctly  derived  proposition,  i.e.,  the  result  of  a  sound  logical  procedure 
rather  than  of  an  approximate  heuristic  technique. 

5.1  Generalized  Modus  Ponens 

The  theorems  that  are  proven  below  are  based  on  the  use  of  a  family  of  propositions  that 
partitions  the  universe  of  discourse  U  in  the  sense  that  every  possible  world  will  satisfy  at  least  one 
proposition  in  & . 

Definition:  If  &  is  a  subset  of  satisfiable  propositions  in  if  such  that  if  w  is  a  possible  world  in 
the  universe  U,  then  there  exists  a  proposition  p  in  3*  such  that  w  h  p,  then  the  family  is  called 
a  partition  of  U. 

These  results  make  use  of  information  such  as  the  values  of  the  unconditioned  necessity  (resp.,  possi¬ 
bility)  distributions  for  antecedent  propositions  p  in  the  family  &  together  with  the  values  Nec(g|p) 
(resp.,  Poss(g|p))  to  “extend”  the  unconditioned  distributions  to  the  “consequent”  proposition  q. 
In  this  sense,  these  findings  interpret,  in  the  same  spirit  used  in  the  theorem  of  Section  4.4  for  other 
basic  laws,  the  generalized  modus  ponens  laws  of  fuzzy  logic: 

Nec(g)  =  sup  [  Nec(g|p)®  Nec(p)  ]  , 

S' 

Poss(g)  =  sup  [  Poss(9|p)®Poss(p)  ]  . 
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Theorem  (Generalized  Modus  Ponens  for  Necessity  Functions):  Let  ^  be  a  partition  of  U  and 
let  q  be  a  proposition.  If  Nec(p)  and  Nec(g|p)  are  real  values,  defined  for  every  proposition  p  in 
the  partition  such  that 

Nec(p)  <  I(p|fr), 

Nec(g|p)  <  inf  [l(<?  |  w)  0  I(p  |  txi)  ]  , 
w\~\f 

then  the  following  inequality  is  valid 

sup  [Nec(g|p)0Nec(p)]  <  I(g  |  if) . 

& 

Proof:  Note  first  that  since  0  is  nonincreasing  in  its  second  argument  and  since 

I(p|  *0  <  I(pM 

for  every  evidential  world  w,  it  is 

Nec(g|p)  <  inf  [l(g  |  u>)  0 1(p  |  w)  ]  <  inf  [l(?  |  u>)  0l(p  |  S?)]  . 

wt-if  u/l-Jf 

It  follows  then  from  the  monotonicity  and  continuity  of  ®  with  respect  to  its  arguments  that 

Nec(p)®  Nec(g|p)  <  I(p|Jf)®inf  [l(?  |  w)  0 1(p  |  ?)] 

uiY-if 

=  inf  l(p|^)®(l(g|t^)0l(p|^)) 
wl-ff 

<  inf  I(g  |  in) 

w\-& 

=  l(q\V) 

since 

I(P|  |tn)0l(p|  &))  <I(«|u>). 

because  of  the  definition  of  0  and  the  continuity  of  ®. 

Since  the  above  inequality  is  valid  for  any  proposition  p  in  the  theorem  follows. 

A  dual  result  also  holds  for  possibility  functions. 

Theorem  (Generalized  Modus  Ponens  for  Possibility  Functions):  Let  ^  be  a  partition  of  U  and 
let  q  be  a  proposition.  If  Poss(p)  and  Poss(5|p)  are  real  values,  defined  for  every  proposition  p  in 
S* ,  such  that 

Poss(p)  >  C(pl^), 

Poss(g|p)  >  sup  [l(g|m)  0l(p|tn)]  , 

uih  if 

then  the  following  inequality  is  valid 

sup  [Poss(g|p)®  Poss(p)]  >C(g|£f). 

& 

Proof:  Note  first  that  if  w  is  an  evidential  world,  then 

c(p  |  £0  >  i(p  I w)  • 
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It  follows  then  from  the  nonincreasing  nature  of  0  with  respect  to  its  second  argument  that 

Poss(g|p)l  >  sup  [l(g  |  in)  0l(p  |  in)] 

>  sup  [l(?  |  tu)  0  C(p  |  #”)  ]  , 

and,  therefore,  that 

Posb(9|p)®  Poss(p)  >  sup  [1(9  I  w)  0C(p\V)]  ®C(p\S’). 


Taking  now,  in  the  above  expression,  the  supremum  with  respect  to  all  propositions  p  in  3s,  it 


is 


sup  [  Poss  (9 Ip)  ®  Poss  (p)  ]  >  sup  sup  [l(9l«")0C(p|  ST)]  ®  C(p|ST) 
s  s  Lu/i-ir 


(1) 


Note,  however,  that  since  &  is  a  partition,  there  always  exists  a  proposition  p  in  3*  such  that 
C(p|  Jf)  =  1  (i.e.,  p  “intersects”  S’)  and,  therefore, 


sup 

S 


sup  [1(9  I  w)  0C(p|  S’)]  ®  C(p\S’) 

w\-& 


>  sup  [1(9  I  til)  0  C(p|  S’)]  ®  C(p  I  S) 

wl -if 

=  8Up  1(9  |  U)) 
tuhJf 


=  C{q\S’). 

The  thesis  follows  at  once  by  combination  of  the  inequalities  (1)  and  (2). 


(2) 

I 


Finally,  notice  also  that,  although  the  theorems  above  have  been  characterized  as  duals,  it  is 
not  necessary  that  be  a  partition  for  the  generalized  modus  ponens  for  necessities  to  hold,  while 
the  proof  of  its  possibilistic  counterpart  relies  on  such  assumption.  It  should  be  clear,  however, 
that  richer  propositional  collections  S*  would  lead  to  better  lower  bounds  for  values  of  the  degree 
of  implication  1(9  |  S’). 


5.2  Variables 

The  ®-transitivity  property  of  I  is  the  essential  fact  expressing  the  relationships  between  the  degrees 
of  implication  of  three  propositions  that  were  proven  in  the  previous  section.  The  statements  of 
these  relations  in  most  works  devoted  to  fuzzy  logic  are  made,  however,  using  special  subsets  of  the 
universe  of  discourse  that  are  described  through  the  important  notion  of  variable.  Introduction  of 
this  concept,  which  is  also  central  to  other  approximate  reasoning  methodologies,  permits  us  to  make 
a  clearer  distinction  between  similarities  defined,  in  some  absolute  sense,  from  the  joint  viewpoint  of 
several  respects  and  related  proximity  measures  that  compare  objects  (in  our  case,  possible  worlds) 
from  the  marginal  viewpoint  of  one  or  more  variables. 

In  what  follows,  we  will  assume  that  only  certain  propositions,  specifying  the  value  of  a  system 
variable  belonging  to  a  finite  set 

r=  {x,y,z,...  }, 

will  be  used  to  characterize  possible  worlds. 
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The  propositions  of  interest  are  those  formed  by  logical  combination  of  statements  of  the  type 

‘The  value  of  the  variable  V  is  v,” 

where  V  is  in  the  variable  set  3^  and  where  v  is  a  specific  value  in  the  domain  ■Sr(Vr)  of  the  variable 
V. 

We  will  also  assume  that,  in  any  possible  world,  the  value  of  any  variable  is  a  member  of  the 
corresponding  domain  of  definition  of  the  variable.  In  the  context  of  our  discussion,  we  will  not 
need  to  make  special  assumptions  about  the  scalar  or  numeric  nature  of  the  state  variables,  using 
the  notion  in  the  same  primitive  and  general  sense  in  which  it  is  customarily  used  in  the  predicate 
calculus. 

We  will  be  specially  interested  in  subsets,  called  variable-sets,  of  the  universe  U  consisting  of 
worlds  where  the  value  of  some  variable  V  is  equal  to  a  specified  value  v.  We  will  denote  by  [X  =  z] 
(similarly  [y  =  y],  etc.)  the  set  of  all  possible  worlds  where  the  proposition  “The  value  of  the 
variable  X  is  x”  is  true.  Clearly,  the  variable-sets  in  the  collection 

{  [X  =  z]  :  z  is  in  1&(X) } 

partition  the  universe  into  disjoint  subsets.  These  collections  have  recently  been  used  to  charac¬ 
terize  the  concept  of  rough  sets  [30],  of  importance  in  many  information-system  analysis  problems, 
including  some  that  arise  in  the  context  of  approximate  reasoning.  A  similar  notion  has  also  been 
used  also  to  describe  algorithms  for  the  combination  of  probabilities  and  of  belief  functions  [39]. 

To  simplify  the  notation  we  will  write 


tuhz,  why, . . . 

as  shorthand  for  tnh  [X  =  z],  u>l-[y  =  y],  . . .  ,  respectively. 

5.2.1  Possibilistic  Structures  and  Laws 

The  usual  statements  of  the  laws  of  fuzzy  logic  are  made,  as  mentioned  before,  through  the  use  of 
variables  rather  than  by  means  of  general  symbolic  expressions.  It  is  customary,  for  example,  to 
speak  of  the  possibility  of  the  variable  X  taking  the  value  z,  to  describe  the  value  that  a  possibility 
function  for  an  evidential  set  if  attains  for  the  proposition  [A  =  z]. 

In  our  model,  we  will  say  therefore,  that  a  function 

Poss(-):  &{X)  ~  [0, 1] 

is  a  possibility  function  for  the  evidential  set  if  and  the  variable  X,  whenever 

Poss(z)  >  C  ( [X  =  z]  |  # ) , 

for  all  values  z  in  the  domain  Sf{X).  Similarly,  we  will  say  that  Nec(-)  is  a  necessity  function  for 
X  whenever 

Nec  (z)  <  I  ( [X  =  z]  |  if)  , 

for  all  values  z  in  Sfr(X). 
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If  possibility  distributions  are  point  functions  defined  in  this  way  as  point  functions  in  the  variable 
domain  3!{X),  then  it  is  possible  to  use  the  disjunctive  laws  of  fuzzy  logic  proved  in  Section  4.4  to 
extend  their  definition  over  the  power  set  of  3!{X),  i.e., 

Nec(j4UB)  =  max  [ Nec(A),  Nec(B) ]  , 

Poss(i4US)  =  max  [  Poss(/l),  Poss(B)  ]  , 

where  A  and  B  are  subsets  of  the  domain  &(X).  These  equations  are  usually  given  as  the  basic 
disjunctive  laws  of  possibility  distributions. 

Note  that,  using  such  extensions,  both  possibility  and  necessity  functions  are  nondecreasing 
functions  (with  respect  to  the  order  induced  by  set  inclusion).  The  value  of  Nec(A)  measures 
the  extent  by  which  the  evidence  supports  the  statement  that  the  variable  value  necessarily  lies  in 
the  subset  A  of  its  domain  of  definition,  with  a  dual  interpretation  being  applicable  for  possibility 
distributions. 

5.2.2  Marginal  and  Joint  Possibilities 

The  original  similarity  relation  introduced  in  Section  3.1  may  be  considered  to  be  a  measure  of 
proximity  between  possible  worlds  from  the  joint  viewpoint  of  all  system  variables.  The  notion 
of  variable  permits,  however,  the  definition  of  similarities  from  the  restricted  viewpoint  of  some 
variables  or  subsets  of  variables. 

These  restricted  perspectives  play  a  role  with  respect  to  the  original  similarity  S  that  is  analogous 
to  that  of  marginal  probability  distributions  with  respect  to  joint  probability  distributions.  To  derive 
useful  expressions  that  describe  similarities  between  two  values  x  and  x'  of  the  same  variable  X, 
it  should  be  noted  first  that  the  degree  of  implication  I(- 1  •)  is  transitive.  This  fact  permits  the 
application  of  a  theorem  of  Valverde  [44]  to  define  a  function  Sx  by  means  of  the  expression 

Sx :  &{X)  x  3!{X)  »-►  [0,1]:  ( x,x ')  min  [l(z  I*'),  I(*'  I  a:) ]  ■ 

Defined  in  this  way  as  a  “symmetrization”  of  the  preorder  induced  by  the  degree  of  implication 
I(- 1  •),  the  marginal  similarity  Sx  has  the  properties  of  a  similarity  function.  Furthermore,  the 
“projection"  operation  entailed  by  the  use  of  I(a:|a:,)»  based  on  the  projection  of  every  z'-world 
into  the  set  of  z-worlds),  may  be  considered  to  be  the  basic  mechanism  to  transform  the  original 
similarity  function  into  one  that  only  discern  differences  in  the  values  of  the  variable  X . 

It  must  be  noted,  however,  that,  unless  additional  assumptions  are  made  about  the  nature  of  the 
original  similarity  S,  the  function  Sx  fails  to  satisfy  the  intuitive  requirement 

S(w,w')  <  Sx(w,w'), 

whenever  w  F  x  and  w'  h  x'  i.e.,  the  similarity  between  two  objects  from  a  restricted  viewpoint  is 
always  higher  than  their  similarity  from  more  general  regards  that  encompass  additional  criteria  of 
comparison. 

Although  considerable  research  remains  to  identify  alternative  definitions  of  marginal  similarities 
that  are  not  hampered  by  this  problem,  a  basic  result  of  Valverde  [44],  presented  in  Section  6.2  below, 
appears  to  provide  the  essential  tool  that  must  be  employed  in  to  produce  the  required  coarser 
measures.  The  role  of  additional  reasonable  assumptions  that  might  be  demanded  from  S  so  as  to 
facilitate  the  construction  of  marginal  similarities  with  desirable  characteristics  is  also  the  object  of 
current  investigations  of  the  author. 
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5.2.3  Conditional  Distributions  and  Generalized  Inference 

The  basic  conditional  structures  of  fuzzy  logic  are  usually  defined  as  elastic  constraints  that  restrict 
the  values  of  a  variable  given  those  of  another.  By  simple  extension  of  our  previous  convention  to 
conditional  structures,  we  will  write  Nec(y|z)  and  Poss(y|2r) ,  as  shorthand  for 

Nec  ( [y  =  y]  |  pf  =  x]  )  and  Poss  ( [Y  =  y]  |  [X  =  *]  )  > 


respectively. 

If  a  classical  (i.e.,  Boolean)  inferential  rule  of  the  type 

“If  X  =  x,  then  Y  is  in  R{x)” 


is  thought  of  as  the  definition  of  a  relation  R  defined  over  pairs  ( x,y )  in  the  Cartesian  product 
X  x  Y,  then  such  a  relation  may  be  used  to  define  a  multivalued  mapping  that  maps  possible  values 
of  X  into  possible  values  of  Y  as  illustrated  in  Figure  5. 


X 


Figure  5:  Inference  as  a  Compatibility  Relation. 

Such  a  compatibility  relation  perspective  was  an  essential  element  of  the  original  formulations 
of  both  the  Dempster-Shafer  calculus  of  evidence  [8]  where  distributions  in  some  space  (i.e.,  the 
domain  of  some  variable  X)  are  mapped  into  distributions  of  another  variable  (i.e.,  the  domain  of 
another  variable  Y)  by -direct  transfer  of  “mass”  from  individual  values  to  the  union  of  their  mapped 
projections  and  the  compositional  rule  of  inference  [51]. 
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Note  that,  whenever  Poss(y|z)  =  1,  if  the  bound  is  actually  attained,  i.e,  if 

sup  [  I(y  |  w)  0  I(z  |  w)  ]  =  1 , 

w\~& 

then  it  is  possible  for  an  evidential  world  w  in  [X  =  x]  (i.e.,  I(z  |  to)  =  1)  to  be  such  that  why. 
Pairs  ( x,y )  such  that  Poss(y|z)  =  1  may  be  considered  to  approximate  the  core10  of  a  generalized 
inferential  relation  that  allows  to  determine  bounds  for  the  similarity  between  evidential  worlds 
and  those  in  the  variable  set  [T  =  y]  on  the  basis  of  knowledge  of  similar  bounds  applicable  to 
the  variable  set  [X  =  *].  This  relation,  which  is  the  fuzzy  extension  of  the  classical  compatibility 
mapping  R  illustrated  in  Figure  5,  may  be  thought  as  a  descriptor  of  the  behavior,  for  z-worlds, 
of  the  values  of  the  variable  Y  “near”  R.  The  compatibility  relation  is  itself  approximated  by  (or 
embedded  in)  the  core  of  the  conditional  possibility  distribution,  i.e.,  worlds  w  such  that  w  h  x  and 
why,  with  Poss(y|z)  =  1. 

Since  the  collection  of  the  sets  [X  =  x ]  partitions  the  universe  U  into  disjoint  sets,  then  the 
generalized  modus  ponens  laws  may  be  readily  stated  in  terms  of  variable  values  as 

Nec(y)  =  sup  [Nec(y|z)®Nec(z)j  , 

X 

Poss(y)  =  sup  [Poss(y|z)®Po88(z)]  , 

X 

clearly  showing  the  basic  nature  of  the  inferential  mapping  as  the  composition  of  relational  combi¬ 
nation  (i.e.,  @>- “intersection”)  and  projection  (i.e.,  maximization). 


5.2.4  Fuzzy  Implication  Rules 

In  this  section  we  will  examine  proposed  interpretations  for  conditional  rules,  usually  stated  in  the 
form 

If  X  is  A,  then  Y  is  B , 

within  the  context  of  possibilistic  logic.  While,  in  two-valued  logic,  any  such  rule  simply  states  that 
whenever  a  condition  A  is  true,  another  condition  B  also  holds,  various  interpretations  have  been 
proposed  for  rules  expressing  other  notions  of  conditional  truth. 

In  the  case  of  probabilities,  for  example,  degrees  of  conditionality  have  been  modeled  either  by 
means  of  conditional  probability  values  Prob(/l  |  B),  which  measure  the  likelihood  of  B  given  the 
assumed  truth  of  A,  or  by  the  alternative  interpretation  Prob(-u4  V  B),  used  by  Nilsson  [29]  in  his 
probabilistic  logic,  which  esssentially  quantifies  the  probability  that  a  rule  is  a  valid  component  of  a 
knowledge  base.  Either  one  of  these  interpretations  is  valid  in  particular  contexts  being,  respectively, 
the  probabilistic  extensions  of  the  so  called  “de  re,”  i.e., 


p-*n?, 


and  “de  dicto”,  i.e., 


n  (p  -»  g) , 


interpretations  of  conditionals  in  modal  logic. 

10The  core  of  a  fuzzy  set  |i:Uh  [0,1]  is  the  set  of  all  points  w  such  that  fi(w)  -  1,  i.e.,  the  points  that  "fully” 
belong  to  p. 
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In  fuzzy  logic,  two  major  interpretations  have  been  advanced  to  translate  conditional  rules,11 
with  A  and  B  corresponding  to  the  fuzzy  sets 

pA:  X  ►-»  [0, 1] ,  and  pB-  Y  [0, 1] . 

The  first  interpretation  was  originally  proposed  by  Zadeh  [52],  as  a  formal  translation  of  the 
statement 

If  nA  is  a  possibility  for  X,  then  pB  is  a  possibility  distribution  for  Y. 

This  conditional  statement,  which  may  be  regarded  as  a  constraint  on  the  values  of  one  variable 
given  those  of  another,  states  the  existence  of  a  conditional  possibility  function  Poss(-|-)  such  that 

Pb(v)  >  sup  [PossCylz)©^^)]  >  Poss(y|z)®/i,4(z)  • 

X 

Recalling  now  the  definition  and  properties  of  the  pseudoinverse,  we  may  restate  this  particular 
interpretation  as 

Pobb (y|z)  =  HB(y)  >  l(ylu>)0l(a:lu;). 

for  every  world  w  b  8\ 

In  Zadeh’s  original  formulation,  made  within  the  context  of  a  calculus  based  on  the  minimum 
function  as  the  T-norm,  conditionals  were,  however,  formally  translated  by  means  of  the  pseudoin¬ 
verse  of  the  Lukasiewicz  T-norm.  Certain  formal  problems  associated  with  such  a  combination  were 
pointed  out  by  Trillas  and  Valverde[42],  who  developed  translations  consistent  with  the  T-norm 
used  as  the  basis  for  the  possibilistic  calculus. 

Using  the  characterization  of  conditionals  introduced  in  Section  4.5,  this  relation  may  also  be 
thought  of  as  a  measure  of  the  degree  by  which  a  possibility  for  Y  exceeds  a  fraction  (measured 
by  the  conditional  possibility  distribution)  of  a  given  possibility  distribution  for  X.  In  particular, 
whenever  Poss(y|z)  =  1,  then  pB(y)  >  pA(x),  indicating  the  possible  existence  —since  Poss(y|i) 
is  only  an  upper  bound  of  I(y  |  w)  0  I(a;  |  w)  —  of  an  evidential  world  such  that  w  h  x  and  w  h  y, 
with  x  in  A  and  y  in  B. 

As  illustrated  in  Figure  6,  where  it  has  been  assumed  that  the  underlying  metric  (i.e.,  dissimilar¬ 
ity)  is  proportional  to  the  euclidean  distance  in  the  plane,  the  core  of  the  corresponding  conditional 
possibility  distribution  is  an  (upper)  approximant  of  a  classical  compatibility  relation  (indicated  by 
the  shaded  area  in  the  figure)  that  fans  outward  from  the  Cartesian  product  of  the  cores  of  A  and  B. 
If  this  interpretation  is  taken,  whenever  several  such  rules  are  available,  then  each  one  of  these  rules 
will  lead  to  a  separate  possibility  distribution.  Combination  of  these  upper  bounds  by  minimization 
results  in  a  sharper  possibility  estimate  that  represents  the  “integrated”  effect  of  the  rule  set. 

The  second  interpretation  of  conditional  relations,  leading  to  a  wide  variety  of  practical  appli¬ 
cations  [41],  was  utilized  by  Mamdani  and  Assilian  to  develop  fuzzy  controllers.  The  basic  idea 
underlying  this  explanation  follows  an  approach  originally  outlined  by  Zadeh  [47,48,51].  In  this  case, 
a  number  of  conditional  statements  of  the  form 

If  X  is  Ak ,  then  Y  is  B*  ,  k  =  1,  2,  . . . ,  n , 

are  given  as  a  combined  “disjunctive”  description  of  the  relation  between  X  and  Y ,  rather  than 
as  a  set  of  independently  valid  rules.  The  purpose  of  this  rule  set  is  the  approximation  of  the 

11 A  rather  encompassing  account  of  potential  fuzzy  reasoning  mechanisms  can  be  found  in  a  paper  by  Mizumoto, 
Fukami,  and  Tanaka.  [27] 
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core  (A) 
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Figure  6:  Rules  as  Poesibilistic  Approximants  of  a  Compatibility  Relation. 
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Figure  7:  Rule-Sets  as  Possibilistic  Approximants  of  a  Compatibility  Relation 
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compatibility  relation  by  a  “fuzzy  curve”  generated  by  disjunction  of  all  the  rules  in  the  set,  as 
shown  in  Figure  7. 

Recalling  the  characterization  of  conditioning  as  an  extension  of  a  classical  compatibility  relation, 
we  may  say  that  the  core  of  the  compatibility  relation  is  approximated  by  above  by  the  union 

n 

U  (core(^*)  x  «**(/*«»)] 

k=l 

of  the  Cartesian  products  of  the  cores  of  the  fuzzy  sets  for  At  and  B*.  In  this  case  the  multiple  rules 
are  meant  to  approximate  some  region  of  possible  (X,  Y)  values,  and  the  result  of  application  of 
individual  component  rules  must  be  combined  using  maximization  to  produce  a  conditional  possibil¬ 
ity  function.  We  may  say,  therefore,  that  under  the  Zadeh-Mamdani-Assilian  (ZMA)  interpretation, 
the  function 

Poss(y|z)  =  sup  min(/M(z), /*#(&)  )J  > 
is  a  conditional  possibility  for  Y  given  X. 

It  is  important  to  note  that  the  two  interpretations  of  fuzzy  rules  that  we  have  just  examined 
are  based  on  different  approaches  to  the  approximation  (by  above)  of  the  value 

sup  l(y|u>)0l(z|u>) 
u/y-ff 

being,  in  the  the  case  of  the  Zadeh-TVillas-Valverde  (ZTV)  method,  the  result  of  the  conjunction  of 
multiple  fuzzy  relations  such  as  that  illustrated  in  Figure  8,  while,  in  the  case  of  the  ZMA  logic,  the 
construction  requires  disjunction  of  relations  such  as  that  illustrated  in  Figure  9. 

The  difference  between  both  approaches  when  combining  several  rules  is  illustrated  also  in  Fig¬ 
ures  10  and  11,  showing  the  contour  plots  for  the  a-cuts  of  the  fuzzy  relations  that  are  obtained 
in  a  simple  example  involving  four  rules.  In  these  figures,  the  rectangles  with  a  dark  outline  corre¬ 
spond  to  the  Cartesian  products  of  the  cores  of  the  antecedents  At  and  B*.  Darker  shades  of  gray 
correspond  to  higher  degrees  of  membership. 

The  reader  should  be  cautioned,  however,  about  the  potential  for  invalid  comparisons  that  may 
result  from  hasty  examination  of  these  figures.  Each  formalism  should  be  regarded  as  a  procedure  for 
the  approximation  of  a  compatibility  relation  that  is  based  on  a  different  approach  for  the  description 
of  relationships  between  variables.  In  the  case  of  the  ZMA  interpretation,  the  intent  is  to  generalize 
the  interpolation  procedures  that  are  normally  employed  in  functional  approximation.  As  such,  this 
approach  may  be  said  to  be  inspired  by  the  methodology  of  classical  system  analysis.  The  ZTV 
approach,  by  contrast,  is  a  generalization  of  classical  logical  formulations  and  may  be  regarded, 
from  a  relational  viewpoint,  as  a  procedure  to  describe  a  function  as  the  locus  of  points  that  satisfies 
a  set  of  constraints  rather  than  as  a  subset  of  “fuzzy  points”  of  a  Cartesian  product. 

Figures  10  and  11,  while  showing  that  the  same  rule  sets  would  lead  to  radically  different  results, 
should  not  be  considered,  therefore,  to  discredit  interpolative  approaches  as  such  techniques,  pro¬ 
ceeding  from  a  different  perspective,  should  normally  be  based  on  rule  sets  that  are  different  from 
those  utilized  when  rules  are  thought  of  as  independent  constraints. 
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Figure  10:  Contour  Plots  for  a  Rule  Set  (ZTV) 
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Figure  11:  Contour  Plots  for  a  Rule  Set  (ZMA) 


33 


6  THE  NATURE  OF  SIMILARITY  RELATIONS 


In  this  closing  section,  we  will  examine  issues  that  arise  naturally  from  our  previous  examination  of 
the  role  of  similarities  as  the  semantic  bases  for  possibility  theory. 

Our  discussion  focuses  on  two  topics.  We  look  first  at  the  requirements  that  our  theory  imposes 
upon  the  nature  of  the  scales  used  to  measure  proximity  or  resemblance  between  possible  worlds. 
Finally,  our  examination  of  the  interplay  between  similarities  and  possibilities  turns  to  issues  related 
to  the  generation  of  similarity  relations  from  such  sources  as  domain  knowledge  that  describes 
significant  relations  between  system  variables. 

6.1  On  Similarity  Scales 

Our  previous  interpretation  of  possibilistic  concepts  and  structures  has  been  based  on  the  use  of 
measures  of  proximity  that  quantify  interobject  resemblance  using  real  numbers  between  0  and  I. 
Our  assumptions  about  the  use  of  the  [0, 1]  interval  as  a  similarity  scale  have  been  made  primarily, 
however,  as  a  matter  of  convenience  so  as  to  simplify  the  description  of  our  model  while  being 
consistent  with  the  customary  definitions  of  possibility  and  necessity  distributions  as  functions  taking 
values  in  that  interval. 

Close  examination  of  the  actual  requirements  imposed  upon  our  similarity  scales  reveals,  however, 
that  our  measurement  domain  may  be  quite  general  so  as  to  include  symbolic  structures  such  as 

{  identical ,  very  timilar , . . . ,  completely  dissimilar}  . 

Our  model  is  based  on  the  use  of  a  partially  ordered  set  having  a  maximal  and  a  minimal  element 
that  measure  identity  and  complete  dissimilarity,  respectively.  Furthermore,  we  have  assumed  the 
existence  of  a  binary  operation  (the  triangular  norm  ®)  mapping  purs  of  possible  worlds  into  real 
numbers,  with  certain  desirable  order-preserving  and  transitive  propel  ties.  The  concept  of  triangular 
norm,  however,  does  not  rely  substantially  on  the  use  of  real  numbers  as  its  range  and  may  be  readily 
extended  to  more  general  partially  ordered  sets  with  maximal  and  minimal  elements. 

We  have  also  assumed  a  continuity  property  for  the  triangular  norm  operation.  This  property, 
however,  simply  requires  that  a  notion  of  proximity  also  exist  among  similuity  values  so  as  to 
provide  a  form  of  (order-consistent)  topology  in  that  space.  While,  in  general,  more  precise  scales 
will  result  in  more  detailed  representations  of  interworld  similuity,  it  is  important  to  stress  that  the 
similuity-based  model  presented  here  does  not  rely  in  "denseness”  assumptions  such  as  the  existence 
an  intermediate  value  c  between  any  different  values  a  and  b  in  the  similuity-measurement  scale. 

From  a  practical  viewpoint,  the  major  requirement  is  to  quantify  proximity  in  such  a  way  as  to 
be  able  to  determine  that  two  quantities  ue  similu  to  some  degree  (i.e.,  approximate  matching). 
The  degree  of  precision  that  such  a  matching  entails  is  problem-dependent  and  will  be  typically  the 
result  of  conflicting  impositions  between  the  desire,  on  one  hand,  to  keep  granuluity  relatively  high 
to  reduce  complexity,  and  the  need,  on  the  other,  to  describe  system  behavior  at  an  acceptable  level 
of  accuracy.  The  work  of  Bonissone  and  Decker  [4]  is  a  significant  example  of  the  type  of  systematic 
study  that  must  be  curied  out  to  define  similuity  scales  that  ue  both  useful  snd  tractable. 
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62  The  Origin  of  Similarity  Functions 

The  model  of  fuzzy  logic  presented  in  this  note  is  centered  on  the  metric  notion  of  similarity  as  a 
primitive  concept  that  is  useful  to  explain  the  nature  of  possibilistic  constructs  and  the  meaning 
of  possibilistic  reasoning.  In  this  formulation,  similarities  are  defined  as  real  functions  defined  over 
pairs  of  possible  worlds. 

.  From  this  perspective,  similarities  describe  relations  of  resemblance  between  objects  of  high  com¬ 
plexity,  which,  typically,  result  from  consideration  of  a  large  number  of  system  variables.  Reliance 
on  such  complex  structures  has  been  the  direct  consequence  of  a  research  program  that  stressed 
conceptual  clarification  as  its  primary  objective.  In  practice,  however,  it  will  be  generally  difficult 
to  define  complex  measures  that  quantify  similarity  between  complex  objects  on  the  basis  of  a  large 
number  of  criteria. 

Similarities  provide  the  framework  that  is  required  to  understand  approximate  relations  of  corele¬ 
vance,  usually  stated  as  generalized  conditional  rules.  The  practical  generation  of  similarity  functions 
typically  proceeds,  however,  in  the  opposite  direction,  from  separate  statements  about  limited  as¬ 
pects  of  system  behavior  to  general  metric  structures.  Once  such  resemblance  measures  are  defined, 
they  may  be  used  to  express  and  acquire  new  laws  of  system  behavior  determined,  for  example,  from 
historical  experience  with  similar  systems.  Furthermore,  such  similarity  notions  may  be  used  as  the 
basis  for  analogical  reasoning  systems  that  try  to  determine  system  state  on  the  basis  of  similarity 
to  known  cases  [23]. 

Perhaps  the  simplest  mechanism  that  may  be  devised  to  generate  complex  metrics  from  sim¬ 
pler  ones  is  that  which  starts  with  measures  of  resemblance  that  quantify  proximity  from  a  limited 
viewpoint.  These  metrics  are  usually  derived,  using  a  variety  techniques,  in  unsupervised  pattern 
classification  (or  clustering)  problems  [20].  In  many  important  applications,  hierarchical  taxonomies 
— a  feature  of  many  representation  approaches  in  artificial  intelligence — may  be  used,  often  in  con¬ 
nection  with  a  variety  of  weighing  schemes — quantifying  branching  importance — to  generate  metrics 
that  often  satisfy  the  more  stringent  requirements  of  an  ultrametric  [22]. 

Classification  hierarchies  such  as  those  may  be  thought  of  as  sets  of  general  rules,  having  a  par¬ 
ticularly  useful  structure,  that  specify  interset  proximity  from  relevant,  but  restricted  viewpoints, 
eventually  providing  measures  of  similarity  between  variable  values  (i.e.,  the  “leaves”  of  the  taxo- 
nomical  tree).  More  generally,  however,  we  may  expect  that  sets  of  possibilistic  rules  (i.e.,  a  general 
knowledge  base)  defining  a  general  semantic  network  of  corelevance  relations  may  be  available  as 
the  source  for  the  determination  of  interobject  proximity.  These  possibilistic  semantic  networks 
resemble  conventional  semantic  networks  in  most  regards,  being  more  general  in  that,  in  addition 
to  specifying  knowledge  about  system  behavior  in  some  subsets  of  state-space,13  they  also  specify 
characteristics  of  behavior  in  neighborhoods  of  those  subsets. 

We  may  think,  therefore,  that  the  antecedents  of  implicational  rules  define  general  regions  in  state 
space  where  existence  of  relevant  knowledge  may  increase  insight  through  application  of  inferential 
rules.  Using  Zadeh’s  terminology,  these  antecedents  define  “granules”  that  identify  important  regions 
of  state-space  and  indicate  the  level  of  accuracy  that  is  required  (or  granularity )  to  perform  effective 
system  analysis.  In  this  case,  the  possibilistic  granules  correspond  to  fuzzy  sets  that  are  used  to 
specify  both  what  is  true  in  the  core  of  the  granule  and,  with  decreasing  specificity,  what  is  true 
in  a  nested  set  (i.e.,  the  a-cuts)  of  its  neighborhoods.  The  ability  to  specify  behavior  using  such 
a  topological  structure  results  in  inferential  gains  that  are  the  direct  consequence  of  our  ability 

12The  expression  “state- space"  is  loosely  used  here  to  indicate  the  space  defined  by  all  system  variables. 
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to  reason  by  similarity;  an  ability  that  is  made  possible  by  the  approximate  matching  property 
of  the  generalized  modus  ponens.  From  another  perspective  yet,  the  fuzzy  granules  identified  by 
possibilistic  rules  may  also  be  thought  of  as  generalizations  of  the  arbitrary  variable  sets  used  in 
a  variety  of  artificial  intelligence  efforts  aimed  at  understanding  system  behavior  using  qualitative 
descriptions  of  reality  [16]. 

A  number  of  heuristics  may  be  easily  formulated  to  integrate  “marginal”  measures  of  resemblance 
into  joint  similarity  relations.  More  generally,  however,  we  may  state  the  problem  of  similarity 
construction  as  that  of  defining  metric  structures  on  the  basis  of  knowledge  of  the  aspects  of  system 
behavior  that  are  important  to  its  understanding — i.e.,  the  previously  mentioned  granules,  which 
define  what  must  be  distinguished.  Since  generally  those  granules  are  fuzzy  sets,  the  relevance  to 
similarity  construction  of  the  following  representation  theorem,  due  to  Valverde,  may  be  immediately 
seen: 


Theorem  [Valverde]:  A  binary  function  5  mapping  pairs  of  objects  of  a  universe  of  discourse  U 
into  [0, 1]  is  a  similarity  relation,  if  and  only  if  there  exists  a  family  X  of  fuzzy  subsets  of  U  such 


that 


S(w,  w')  =  inf  | min  ^ h(w )  0  h(w*),  h(w')  0  h(w)  ^  j  , 


for  all  w  and  tt/  in  U,  where  the  infimum  is  taken  over  all  fuzzy  subsets  h  in  the  family  X. 


Besides  its  obvious  relevance  to  the  generation  of  similarity  relatione  from  knowledge  of  important 
sets  in  the  domain  of  discourse,  Valverde’s  theorem — resulting  originally  from  studies  in  pattern 
recognition — is  also  of  potential  significance  to  the  solution  of  knowledge  acquisition  problems  be¬ 
cause  of  the  important  relations  that  exist  between  learning  procedures  and  structure-discovery 
techniques  such  as  cluster  analysis. 
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7  CONCLUSION 


This  note  has  presented  a  similarity-based  model  that  provides  a  clear  interpretation  of  the  major 
structures  and  methods  of  possibilistic  logic  using  metric  concepts  that  are  formally  different  from 
the  set-measure  constructs  of  probability  theory.  Regardless  of  the  potential  existence,  so  fax  un¬ 
established,  of  probability-based  interpretations  for  possibilistic  structures,  this  metric  model  makes 
clear  that  there  are  no  compelling  reasons  to  confuse  two  rather  different  aspects  of  uncertainty  into 
a  single  notion  simply  because  one’s  favorite  theoretical  framework,  in  spite  of  its  otherwise  many 
remarkable  virtues,  fails  to  fully  capture  reality. 

Succintly  stated,  being  in  a  situation  that  resembles  a  state  of  affairs  S  does  not  make  S  likely  or 
viceversa.  Furthermore,  our  reference  state  may  not  even  be  possible  in  the  current  circumstances 
— making  it  completely  unlikely — but  we  may  still  find  it  useful  as  a  comparison  landmark  .This 
use  of  “impossible”  examples  as  a  way  to  illustrate  system  behavior  is  very  prevalent  in  human 
culture,  being  exemplified  by  such  utterances  as  “he  had  the  strength  of  a  horse  and  the  swiftness 
of  a  swallow,”  even  if  it  is  obvious  to  all  that  no  such  beasts  exist  other  than  for  such  metaphorical 
purposes. 

The  insight  provided  by  this  model  makes  it  rather  obvious  that  very  little  can  be  gained  by 
continuing  to  assert  a  potential— although  never  revealed — encompassing  probabilistic  interpretation 
for  possibilistic  structures  that,  presumably,  would  render  them  unnecessary  as  serious  objects  of 
scientific  discourse.  In  addition,  and  quite  beyond  whatever  understanding  theory  may  provide,  the 
current  success  of  possibilistic  logic  as  the  basis  for  major  systems  of  important  human  value  [41] 
— often  unmatched  by  other  approaches — should  be  enough  to  convince  those  having  more  pragmatic 
perspectives  as  to  its  utility. 

The  task  for  approximate  reasoning  researchers  is  to  proceed  now  beyond  unnecessary  controversy 
into  the  study  of  the  issues  that  arise  from  models  such  as  the  one  presented  in  this  note.  Among 
such  questions,  further  studies  of  the  relations  between  the  notions  of  possibility,  similarity,  and 
negation  and  of  those  between  probability  and  possibility  are  of  major  importance. 
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INTRODUCTION 

In  this  brief  communication,  we  summarize  the  results  of  recent  research  on  the  conceptual  foundations  of 
fuzzy  logic  [5].  This  research  resulted  in  the  formulation  of  several  semantic  models  that  interpret  the  major 
concepts  mul  structures  of  fuzzy  logic  in  terms  of  the  more  primitive  notion  of  resemblance  and  similarity 
between  “possible  worlds,”  i.e.,  the  possible  states,  situations  or  behaviors  of  a  real-world  system.  The  metric 
structures  representing  this  notion  of  proximity  are  generalizations  of  the  accessibility  relation  of  modal 
logics  [lj. 

Possibilistic  reasoning  methods  may  be  characterized,  by  means  of  our  interpretation,  as  approaches  to  the 
description  of  the  relations  of  proximity  that  hold  between  possible  system  states  that  are  logically  consistent 
with  existing  evidence,  and  other  situations,  which  are  used  as  reference  landmarks  By  contrast,  probabilistic 
methods  seek  to  quantify,  by  means  of  measures  of  set  extension,  the  proportion  of  the  set  of  possible  worlds 
whcic  n  proposition  is  true. 

Out  discussion  will  focus  primarily  on  the  principal  characteristics  of  a  model,  discussed  in  detail  in  a  recent 
technical  note  [2],  that  quantifies  resemblance  between  possible  worlds  by  means  of  a  similarity  function  that 
assigns  a  number  between  0  and  1  to  every  pair  of  possible  worlds.  Introduction  of  such  a  function  permits 
to  interpret  the  major  constructs  and  methods  of  fuzzy  logic:  conditional  and  unconditional  possibility  and 
necessity  distributions  and  the  generalized  modus  ponens  of  Zadeh  on  the  basis  of  related  metric  relationships 
between  subsets  of  possible  worlds. 

THE  APPROXIMATE  REASONING  PROBLEM 

Our  semantic  model  of  fuzzy  logic  is  based  on  two  major  conceptual  structures:  the  notion  of  possible 
wot  Id,  which  is  the  basis  for  our  unified  view  of  the  approximate  reasoning  problem  [3],  and  a  metric  structure 
that  quantifies  similarity  between  pairs  of  possible  worlds. 

If  a  reasoning  problem  is  thought  of  as  being  concerned  with  the  dele-  urination  of  the  truth-value  of  a  set 
of  propositions  that  describe  different  aspects  of  the  behavior  of  a  system,  then  a  possible  world  is  simply  a 
function  (called  a  valuation)  that  assigns  a  unique  truth  value  to  every  proposition  in  that  set  and  that,  in 
addition,  is  consistent  with  the  rules  of  propositional  logic.  The  set  of  all  such  possible  worlds  is  called  the 
universe  of  discourse. 

In  any  reasoning  problem,  knowledge  about  the  characteristics  of  the  class  of  systems  being  studied 
combined  with  observations  about  the  particular  system  under  consideration  restricts  the  extent  of  possible 
worlds  that  must  be  considered  to  a  subset  of  the  universe  of  discourse,  called  the  evidential  set,  which  will 
be  denoted  S. 

The  purpose  of  the  inferential  procedures  utilized  in  any  reasoning  problem  may  be  characterized  as  that 
of  establishing  if,  for  a  given  proposition  p,  either  €  **  p  or  £  ->p,  i.e.,  whether  existing  evidence  implies  the 

hypothesis  or  its  negation.  In  approximate  reasoning  problems,  as  illustrated  in  Figure  1,  such  determination 
is,  by  definition,  impossible:  there  are  some  possible  worlds  in  the  the  evidential  set  tvher-  the  hypothesis  is 
true  and  some  where  it  is  false. 


I  ?  \  S  ■  l 

5*;  !'  1 

|  j  Worlds  oonsMsntwNhthssvtdones  [  § 

HYPOTHESIS  TRUE 

HYPOTHESIS  FALK 

Figure  1:  The  Approximate  reasoning  problem 


SIMILARITY  FUNCTIONS  AND  GENERALIZED  IMPLICATION 

In  the  view  of  fuzzy  logic  proposed  by  our  model  the  purpose  of  possibilistic  methods  is  the  description  of 
the  evidential  set  by  characterization  of  the  resemblance  relations  that  hold  between  its  elements  and  elements 
of  other  sets  used  as  reference  landmarks. 

To  represent  similarity  or  resemblance  between  possible  worlds  we  introduce  a  binary  function  5  that 
assigns  a  value  between  0  and  1  to  every  pair  of  possible  worlds  tv  and  it/.  A  value  of  5  equal  to  1  means  that 
i(i  and  to'  are  identical  while  a  value  of  S  equal  to  0  indicates  that  knowledge  of  propositions  that  are  true  in 
one  possible  world  does  not  provide  any  indication  about  the  nature  of  the  propositions  that  are  true  in  the 
other. 

In  addition  to  the  above  requirement  of  reflexivity,  i.e.  5(tv,tv)  =  1,  we  will  need  to  impose  additional 
axioms  to  assure  that  S  captures  the  semantics  of  o  similarity  relation.  In  addition  to  assuming  that  5 
is  symmetric,  i.e.,  S(tv,tv')  =  5(tv',tv),  we  will  also  require  that  5  satisfies  a  form  of  transitivity  that  is 
mrtivated  by  noting  that  if  tv,  tv'  and  tv"  are  possible  worlds  and  if  tv  is  highly  similar  to  u/  and  tv'  is  highly 
similar  to  tv",  then  it  would  be  surprising  if  tv  and  tv"  were  highly  dissimilar.  This  consideration  indicates 
that  knowledge  of  S( to,  tv')  and  5(tv',  tv")  should  provide  a  lower  bound  for  values  of  S(tv,u»"),  as  expressed 
by  the  inequality 

S( tv,  tv")  >  5(tv,tv')®5(tv',tv"), 

where  ®  is  a  binary  operator  used  to  represent  a  real  function  that  produces  the  required  bound.  If  reasonable 
requirements  are  imposed  upon  the  function  ®,  it  is  easy  to  show  that  it  has  the  properties  of  triangular  norms: 
a  class  of  functions  that  play  a  major  role  in  multivalued  logics  (4). 

The  geneialized  transitivity  property  expressed  by  the  above  inequality  may  be  easier  to  understand  as 
a  classical  triangular  inequality  if  it  is  noted  that  the  function  6*1-5  has  the  properties  of  a  metric. 
When  ®  is  the  Lukasiewicz  norm  a®  b  *  max(o  +  b  -  1,0),  then  the  transitivity  property  of  5  is  equivalent 
to  the  well-known  triangular  property  of  distance  functions.  If  0  corresponds  to  the  Zadeh  triangular  norm 
a® b  =  inin(a, 5),  then  6  may  be  shown  to  satisfy  the  more  stringent  ultrametric  inequality. 

The  correspondence  between  propositions  and  subsets  of  possible  worlds  simplifies  the  interpretation  of 
the  classical  rule  of  modus  ponens  as  a  rule  of  derivation  based  on  the  transitive  property  of  set  inclusion.  If 
thiee  propositions  p,  q  and  r  are  such  that  the  set  of  possible  worlds  where  p  is  true  is  a  subset  of  the  set  of 
possible  worlds  where  q  is  true,  and  if  such  set  is  itself  a  subset  of  the  set  of  worlds  where  r  is  true,  then  the 
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modus  ponens  simply  states  that  the  set  of  p-worlds  is  a  subset  of  the  set  of  r-worlds. 

The  conventional  relation  of  set  inclusion,  based  on  the  binary  truth-value  structure  of  classical  logic, 
allows  only  to  state  that  a  set  of  possible  worlds  is  a  subset  of  another  or  that  it  is  not.  Introduction  of 
a  metric  structure  in  the  universe  of  discourse,  however,  permits  the  quantification  of  the  degree  by  which 
a  set  is  included  into  another.  Every  set  of  possible  worlds,  as  illustrated  in  Figure  2.  is  a  subset  of  some 


Figure  2:  Degree  of  implication 


neighborhood  of  any  other  set.  The  minimal  amount  of  “stretching"  that  is  required  to  include  a  set  of  possible 
worlds  o  in  a  neighborhood  of  a  set  of  possible  worlds  p,  given  by  the  expression  I(p  |  q)  =  inf  sup  S(w,  tu') , 

«•>'(-»  wt-p 

is  called  the  degree  of  implication. 

The  degree  of  implication  function  has  the  important  transitive  property  expressed  by  I (p  |  q)  >  I(p  I  r)  ®  I(r  |  q) , 
which  is  the  basis  of  the  generalized  modus  ponens  of  Zadeh.  As  illustrated  in  Figure  3,  this  important  rule  of 
derivation  tells  us  how  much  the  set  of  p-worlds  should  be  stretched  to  encompass  q  on  the  basis  of  knowledge 
of  the  sizes  of  the  neighborhoods  of  p  that  includes  r  and  of  r  that  includes  9. 


Figure  3:  The  generalized  modus  ponens 

A  notion  dual  to  the  degree  of  implication  is  that  of  degree  of  consistence,  which  quantifies  the  amount 
by  which  a  set  must  be  stretched  to  intersect  another,  and  that  is  given  by  the  expression  C(p|?)  = 
sup  sup  5(h*,u>')  . 

POSS1BILISTIC  DISTRIBUTIONS 

Although  the  transitive  property  of  the  degree  of  implication  essentially  provides  the  bases  for  the  con¬ 
cept  uni  validity  of  the  generalized  modus  ponens,  this  rule  of  derivation  is  typically  expressed  by  means  of 
necessity  and  possibility  distributions. 

An  unconditioned  necessity  distribution  given  the  evidence  t  is  any  function  defined  over  propositions 
that  bounds  by  below  the  degree  of  implication  function,  i.e.,  any  function  satisfying  the  inequality  Nec(p)  < 
I(p  |  c) .  Correspondingly,  an  unconditioned  possibility  distribution  is  any  upper  bound  for  the  degree  of 
consistence  function,  i.e.,  Poss(p)  >  C(p|  £) . 

The  definition  of  conditional  possibility  and  necessity  distributions  makes  use  of  a  form  of  inverse  of  the 
triangular  norm  denoted  0  and  defined  by  the  expression 

a06  =  sup{  c :  6  ®  c  <  a } . 

Using  this  function,  it  is  possible  to  define  conditional  possibilistic  distributions  as  follows: 
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Definition:  A  function  Nec(-|-)  it  called  a  conditional  necessity  distribution  for  €  if 


Nec(?|p)<  inf  (l($|u>)0l(p|u>))., 

Definition:  A  function  Poss(-|-)  it  called  a  conditional  possibility  distribution  for  £  if 

Potf(fjp)  >  tug  (l(?|u>)0l(p|u/)j. 

GENERALIZED  MODUS  PONENS 

The  compositional  rule  of  inference  or  generalized  modua  ponena  of  of  Zadeh  it  -  generalization  of  the 
corresponding  classical  rule  of  inference  that  may  be  used  even  when  known  facts  do  not  match  the  antecedent 
of  a  conditional  rule.  The  interpretation  provided  by  our  model  explains  the  generalized  modus  ponens  as 
an  extrapolation  procedure  that  uses  knowledge  of  the  similarity  between  the  evidence  and  a  set  of  possible 
worlds  p  (the  antecedent  proposition),  and  of  the  proximity  of  p- worlds  to  g-worlds,  to  bound  the  similarity  the 
latter  to  the  evidential  set.  The  actual  statement  of  the  generalized  modus  ponens  for  necessity  distributions 
in  terms  of  similarity  structures  makes  use  of  a  family  V  of  satisfiable  propositions  that  partitions  the  un  »eri  : 
of  discourse: 

Theorem  (Generalized  Modus  Ponens  for  Possibility  Functions):  Let  V  be  a  partition  and  let  q  be  a  propo¬ 
sition.  If  Poss (p)  and  Poss(?|p)  are  real  values,  defined  for  every  proposition  p  in  V,  such  that 

Poss(p)  >  C(p  |  £) ,  Poss((7|p)  >  sug  ( I(?  |  w)  0 1(p  |  to) ) , 

then  the  following  inequality  is  valid: 

sup  ( Poss(flp)  ®  Poss(p)  ]  >  C(g  |  S ) . 

A  dual  result  holds  for  necessity  functions. 
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Abstract 

This  paper  addresses  a  number  of  fundamental  issues  on  the  nature  of  the  concepts 
and  structures  of  fuzzy  logic,  focusing,  in  particular,  on  the  conceptual  and  functional 
differences  that  exist  between  probabilistic  and  possibilistic  approaches. 

A  semantic  model  provides  the  basic  framework  allowing  definition  of  possibilistic 
structures  and  concepts  by  means  of  a  function  that  quantifies  proximity,  closeness,  or 
resemblance  between  pairs  of  possible  worlds.  The  resulting  model  is  a  natural  exten¬ 
sion,  based  on  multiple  conceivability  relations,  of  the  modal  logic  concepts  of  necessity 
and  possibility.  By  contrast,  typical,  chance-oriented,  probabilistic  concepts  and  struc¬ 
tures  rely  on  measures  of  set  extension  that  quantify  the  proportion  of  possible-worlds 
where  a  proposition  is  true. 

Resemblance  between  possible  worlds  is  quantified  by  a  generalized  similarity  rela¬ 
tion,  i.e.,  a  function  that  assigns  a  number  between  0  and  1  to  every  pair  of  possible 
worlds.  Using  this  similarity  relation,  which  is  a  form  of  numerical  complement  of  a  clas¬ 
sic  metric  or  distance,  the  major  constructs  and  methods  of  fuzzy  logic — conditional 
and  unconditional  possibility  and  necessity  distributions  and  the  generalized  modus 
ponens  of  Zadeh — are  defined  and  interpreted. 
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1  Introduction 


In  this  paper,  we  present  a  semantic  model  of  the  major  concepts,  structures,  and  methods 
of  fuzzy  or  possibilistic  [16,17]  logic.  This  model  is  based  on  a  framework  that  combines 
the  notion  of  possible  world  [2]  (i.e.,  a  potential  state  or  situation  of  a  real-world  system) 
with  measures  of  proximity  or  resemblance  between  pairs  of  possible  worlds.  The  resulting 
structures  are  substantially  different  in  character  and  nature  from  those  of  probabilistic 
reasoning,  which  are  based  on  measures  of  set  extension,  used  to  quantify  the  proportion  of 
possible  worlds  where  a  given  proposition  is  true. 

The  results  reported  in  this  paper  are  the  latest  in  a  continuing  investigative  effort  aimed 
at  clarifying  basic  conceptual  similarities  and  differences  between  a  numbi  of  approaches 
to  the  treatment  of  imprecision  and  uncertainty.  Using  also  possible- world  semantic  models, 
prior  research  has  established  that  the  Dempster- Shafer  calculus  of  evidence  may  be  inter¬ 
preted  by  structures  that  result  from  the  combination  of  conventional  probabilistic  calculus 
with  epistemic  logics  [9].  By  contrast,  the  formal  structures  discussed  herein  clearly  show 
that  fuzzy  logic  may  be  understood  in  a  straightforward  fashion  using  conventional  metric 
notions  in  a  space  of  possible  worlds  without  resorting  in  any  form  to  probabilistic  con¬ 
cepts.  Furthermore,  the  actual  functions  that  are  used  to  combine  possibilistic  knowledge 
are  substantially  different  from  those  used  in  the  probability  calculus. 

Our  exposition,  which  will  be  limited  to  the  major  structures  of  fuzzy  logic,  defines 
possibilistic  concepts  using  a  more  primive  notion  that  has  been  found  to  be  an  essential 
component  of  important  human  cognitive  processes  [14].  The  notion  of  similarity ,  in  spite 
of  its  importance  in  reasoning  proceses,  has  not  received  substantial  attention  in  treatments 
based  on  the  use  of  logical  concepts. 

Perhaps  as  a  consequence  of  its  reliance  on  metods  for  the  manipulation  of  symbolic 
strings  and  on  a  single  (partial  order)  relation  between  formulas  (i.e.,  implication)  as  the 
basis  for  almost  all  of  its  techniques  and  procedures,  there  has  been  little  attention  given  in 
formal  logic  to  the  consideration  of  other  formal  structures  that  capture  important  features 
of  human  knowledge  such  as  the  resemblance  that  exists  between  situations  or  circumstances. 
Although,  for  example,  stating  that  Mary  is  worth  $1,000,000  as  oppossed  to  saying  that 
she  is  worth  $999,999  may  be  rather  inconsequential  in  terms  of  the  implication  of  either 
statement  to  a  decision-maker  (e.g.,  trying  to  establish  a  credit  line),  there  is  nothing  in  the 
basic  framework  of  logic  that  makes  the  second  statement  any  more  different  than  saying 
that  Mary  is  broke  (i.e.,  neither  of  the  three  statements  is  logically  consistent  with  the  other 
two). 

The  determination  and  use  of  similarity  information  is,  however,  not  only  central  to  all 
forms  of  analogical  reasoning  but  it  is  an  essential  element  in  the  derivation  of  physical  law. 
Formal  studies  in  measurement  theory  [7]  clearly  show  the  role  that  measures  based  on  similar 
behavior  play  jn  the  derivation  of  rational  measurement  schemes,  while  also  explaining  the 
ubiquitous  presence  of  numeric  scales  throughout  science. 

The  results  presented  in  this  paper  show  that,  when  such  notions  of  proximity  are  for- 
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malized  in  the  context  of  a  possible-worlds  model,  the  major  functional  structures  of  fuzzy 
logic — possibility  and  necessity  distributions — and  its  major  inferential  procedure — the  gen¬ 
eralized  modus  ponens  of  Zadeh — may  be  readily  explained  as  a  natural  extension  of  classical 
logical  concepts.  In  particular,  possibility  and  necessity  distributions  simply  correspond  to 
best  and  worst  scenarios  in  a  space  of  possible  real-world  states,  while  the  generalized  modus 
ponens  [17]  is  a  sound  inferential  procedure  that  may  be  regarded  as  a  form  of  logical  ex¬ 
trapolation  between  neighboring  situations. 

The  scope  of  this  paper  prevents  a  detailed  discussion  of  all  pertinent  results  and  deriva¬ 
tions.  A  complete  account  of  all  relevant  matters  regarding  the  similarity-based  model  of 
fuzzy  logic  presented  in  this  paper  is  presented  in  a  related  technical  note  [10],  which,  essen¬ 
tially,  this  paper  summarizes. 

2  The  Approximate  Reasoning  Problem 

Our  model  of  the  approximate  reasoning  problem  is  based  on  the  notion  of  “possible  world.” 
Informally,  possible  worlds  are  the  conceivable  states  of  affairs  of  a  real-world  system  that 
are  consistent  with  the  laws  of  logic. 

Restricting  ourselves,  for  the  sake  of  simplicity  to  propositional  formulations,  a  possible 
world  is  a  function  [2]  that  assigns  a  unique  conventional  truth  value  (i.e.,  true  or  false)  to 
every  proposition  that  describes  some  relevant  aspect  of  the  state  of  the  system  and,  that, 
in  addition  satisfies  the  axioms  of  propositional  logic. 

In  the  absence  of  any  knowledge  about  the  behavior  of  a  system  of  interest  or  of  any 
observation  about  its  state,  it  is  impossible  to  determine  which,  among  all  conceivable  sit¬ 
uations,  corresponds  to  the  actual  state  of  the  real  world.  Availability  of  factual  evidence 
or  determination  of  the  laws  of  behavior  of  the  system  permits,  however,  to  eliminate  some 
possible  worlds  in  this  universe  of  discourse  from  consideration.  The  remaining  possible 
worlds  correspond  to  satisfiable  propositions  that,  in  addition,  are  logically  consistent  with 
the  evidence.  This  subset  of  conceivable  situations  or  scenarios  will  be  called  the  evidential 
set,  denoted  if. 

If  the  typical  reasoning  problem  is  thought  of  as  the  determination  of  the  truth  value  of 
a  proposition  h  (the  hypothesis),  then  an  approximate  reasoning  problem  may  be  described 
as  one  where  available  evidence  does  not  permit  such  evaluation  without  ambiguity.  In  other 
words,  as  illustrated  in  Figure  1,  there  are  some  members  of  the  evidential  set  where  the 
hypothesis  is  true  and  some  where  it  is  false. 

Our  approach  to  the  formalization  of  the  major  concepts  and  structures  of  fuzzy  logic 
of  fuzzy  logic  is  based  on  a  generalization  of  a  central  concept  of  semantic  models  of  modal 
logics.  Modal  logics  [4]  may  be  generally  described  as  extensions  of  conventional  two- valued 
logic  that  permit  to  qualify,  in  various  ways,  the  meaning  of  propositional  truth. 

In  our  model,  we  utilize  modal  concepts  to  explain  basic  possibilistic  structures  using 
the  more  primitive  notion  of  similarity.  This  notion  is  introduced,  however,  by  means  of 
conventional  set-theoretic  and  logical  concepts.  In  this  regard,  our  approach  to  the  study 
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1  |  Worlds  consistent  with  ths  svldsnos  |  | 

|j  Worlds  logically  Inconsistent  with  ths  evidence 

—1 

—I 

HYPOTHESIS  TRUE 

HYPOTHESIS  FALSE 

Figure  1:  The  Approximate  Reasoning  Problem. 

of  the  interplay  of  modal  and  possibilistic  logics  is  different  from  approaches  such  as  that 
used  by  Lakoff  [6]  who  sought  to  generalize  modal  logics  using  fuzzy-set  concepts;  or  that  of 
Dubois  and  Prade[3],  who  investigated  modal  structures  with  a  view  to  the  development  of 
formal  proof  mechanisms  in  possibilistic  logic. 

A  major  concept  of  semantic  models  of  modal  logic  systems  is  a  binary  relation  R ,  called 
the  accessibility  or  conceivability  relation.  This  relation  is  assumed  to  have  a  number  of 
properties  intended  to  capture  the  semantics  of  various  qualifications  of  propositional  truth, 
ranging  from  logical  necessity  through  the  state  of  knowledge  of  rational  agents  to  concepts 
related  to  the  ideal  behavior  of  ethical  decision-makers. 

Our  aim  is  to  characterize  the  extent  by  which  statements  that  are  true  in  one  situation 
or  scenario  may  be  said,  perhaps  with  some  suitable  modification,  to  be  true  in  another 
state  of  affairs  that  resembles  it.  We  are  particularly  interested  in  describing  more  general 
(i.e.,  less  specific)  propositions  that  are  true  in  one  possible  world  as  a  function  of  the 
propositions  that  are  true  in  another.  In  order  to  model  a  continuous  range  of  proximity 
between  possible  worlds,  we  will  generalize  the  notion  of  accessibility  relation  to  a  full  family 
of  binary  relations  Ra,  indexed  by  a  numerical  parameter  a  taking  values  between  0  and  1, 
along  the  same  lines — albeit  with  a  different  purpose — utilized  by  Lewis  in  his  treatment  of 
counterfactuals  [5]. 

3  Similarity  and  Graded  Possibility 

We  will  introduce  a  family  of  accessibility  relations 

{Ra:ot€  [0,1]}, 


3 


by  means  of  a  binary  function  S,  called  the  similarity  relation,  that  maps  pairs  of  possible 
worlds  into  numbers  between  0  and  1.  The  multiple  relations  of  accessibility  Ra  are  defined 
in  terms  of  this  similarity  function  by 

wRaio1  if  and  only  if  S(w,w')  >a  a  €  [0, 1] . 

The  function  5  is  intended  to  capture  a  notion  of  proximity,  closeness,  or  resemblance 
between  possible  worlds  with  a  value  of  1  corresponding  to  the  identity  of  possible  worlds 
and  a  value  of  0  indicating  that  knowledge  of  propositions  that  are  true  in  a  possible  world 
does  not  provide  any  indication  of  the  propositions  that  are  true  in  the  other.  To  assure  that 
the  function  S  has  the  semantics  of  a  relation  that  quantifies  resemblance  between  possible 
states  of  affairs,  it  is  necessary  to  require  that  it-  satisfies  a  number  of  properties. 

Besides  the  above  mentioned  property  that  the  similarity  between  a  possible  world  and 
itself  has  the  highest  possible  value,  equivalent  to  stating  that  each  accessibility  relation 
Ra  is  reflexive ,  we  will  also  requiie  that  the  similarity  between  different  possible  worlds  be 
strictly  less  than  one.  This  requirement  is  intended  to  assure  that  the  similarity  relation 
may  distinguish  between  different  states  of  the  possible  world. 

The  similarity  relation  will  also  be  assumed  to  be  symmetric,  and  to  satisfy  a  relaxed 
form  of  transitivity.  Clearly,  if  the  pairs  of  possible  worlds  ( w ,  tv')  and  ( w ',  w")  correspond 
to  highly  similar  situations,  it  would  be  surprising  if  tv  and  tv"  were  highly  dissimilar.  It  is 
natural  to  assume,  therefore,  that 

S(w,  tv")  <  S(w,  tv ')  ®  S(w',  tv") , 

where  ©  is  a  binary  c  perator  used  to  represent  the  lower  bound  as  a  function  of  its  arguments. 
This  requirement  is  equivalent  to  the  relaxed  transitivity  condition 

Ra®0  Ra  ®  R(3  > 

which  replaces  the  usual,  more  stringent,  definition  of  transitivity. 

Imposition  of  reasonable  requirements  upon  the  function  ©  shows  that  it  has  the  prop¬ 
erties  of  a  triangular  norm  [11].  These  functions,  which  play  a  significant  role  in  multivalued 
logics  [12],  may  be  justified,  therefore,  purely  on  the  basis  of  metric  considerations.  Impor¬ 
tant  examples  of  triangular  norms  are 

a®b  =  min(a,b) ,  a®b  =  max(a  +  6  -  1,0) ,  and  a®b=ab, 

called  the  Zadeh,  Lukasiewicz,  and  product  triangular  norms,  respectively. 

The  generalized  transitivity  property  that  is  expressed  by  triangular  norms  clarifies  their 
relationship  to  the  conventional  mathematical  concept  of  metric.  If  S  is  a  similarity  function, 
then  the  function  6  =  1  —  S  has  the  properties  of  a  distance  function.  When  ©  cor  e  ponds 
to  the  Lukasiewicz  norm,  then  the  transitivity  property  of  S  corresponds  to  the  well-known 
triangular  property  of  distance  functions.  If  ©  correponds  to  the  Zadeh  triangular  norm, 
then  8  may  be  shown  to  satisfy  the  more  stringent  ultrametric  inequality. 
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1  Introduction 

The  notion  of  similarity,  which  plays  a  major  role 
in  human  cognitive  processes  [4],  may  be  used  to 
formulate  a  number  of  semantic  models  that  ex¬ 
plain  the  major  concepts  of  fuzzy  logic  (5).  These 
formalisms  show  that  possibilistic  reasoning  is  fun¬ 
damentally  different  from  approaches  based  on  the 
notion  of  probability:  an  additive  measure  of  set 
extent. 

The  idea  that  knowledge  of  propositions  that 
are  true  in  certain  situations  may  be  used  to  derive 
truth- values  in  similar  situations  has  not  received 
much  attention  in  conventional  logical  treatments. 
This  state  of  affairs  may  be  traced  to  the  reliance 
of  logical  methods  on  symbolic  procedures  that 
only  recognize  one  important  relationship  between 
formulas,  i.e.,  the  partial  order  defined  by  impli¬ 
cation. 

This  paper  briefly  describes  one  model  *'  n  ex¬ 
plains  possibilistic  structures:  possibility  and  ne¬ 
cessity  distributions;  and  the  major  derivational 
rule  of  fuzzy  logic:  the  generalized  modus  ponens; 
in  terms  of  simpler  concepts  related  to  notions 
of  resemblance  between  possible  worlds.  In  par¬ 
ticular,  the  latter  procedure  is  shown  to  gener¬ 
alize  its  classical  counterpart  by  allowing  a  form 
of  logical  extrapolation  between  similar  situations 
and  scenarios.  A  full  discussion  of  this  modei  and 
its  implications  is  presented  in  a  related  technical 
note  [2]. 

2  The  Approximate  Reasoning  Problem 

Our  model  is  baaed  on  a  unified  view  of  approx¬ 
imate  reasoning  methodologies  that  regards  these 
procedures  as  techniques  that  describe  certain  prop¬ 
erties  of  subsets  of  possible  worlds.  Informally, 
possible  worlds  are  the  conceivable  states  (i.e.,  sce¬ 


narios,  situations)  of  a  real-world  system  that  are 
consistent  with  the  laws  of  logic.  Restricting  our¬ 
selves  to  propositional  formulations,  a  possible  world 
is  a  function  that  assigns  a  unique  truth  value  (i.e., 
true  or  false)  to  every  proposition  that  describes 
a  relevant  aspect  of  system  of  state  and  behav¬ 
ior  and  that,  in  addition,  satisfies  the  axioms  of 
propositional  logic. 

The  set  of  all  such  possible  worlds  is  called  the 
universe  of  discourse.  Knowledge  about  the  class 
of  systems  being  studied,  combined  with  observa¬ 
tions  about  the  actual  system  under  consideration, 
usually  restricts  the  set  of  states  that  must  be  con¬ 
sidered  in  an  approximate  reasoning  problem  to  a 
proper  subset  of  this  universe.  This  subset,  de¬ 
noted  y,  is  called  the  evidential  set. 

In  a  typical  approximate  reasoning  problem,  as 
illustrated  in  Figure  1,  available  evidence  does  not 
permit  to  determine  if  a  hypothesis  of  interest  is 
true  or  false.  Being  unable  to  determine  such  truth 
value,  approximate  reasoning  methods  try  to  de¬ 
scribe  significant  properties  of  the  evidential  set. 
Possibilistic  techniques  describe  the  relations  of 
similarity  that  hold  between  possible  worlds  in  the 
evidential  set  and  possible  worlds  in  other  sets, 
used  as  reference  landmarks. 

3  Similarity  Relations 

To  capture  the  notion  of  proximity  or  resemblance 
between  possible  worlds,  we  will  introduce  a  func¬ 
tion  5  that  assigns  a  number  between  0  and  1  to 
every  pair  of  possible  worlds.  This  function  per¬ 
mits  to  define  a  family  of  relations  between  pos¬ 
sible  worlds  that  generalizes  the  classical  modal 
notion  of  accessibility  [1).  By  assumption,  5  at¬ 
tains  a  value  of  1  only  when  its  two  arguments  are 
identical.  A  value  of  0,  by  contrast,  is  intended  to 
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of  the  values  q  such  that  q  implies  p  to  the  degree 
o,  expressed  by 

I(p|?)«  inf  su^  S(tt>,  u>') , 

defines  a  function  I  called  the  degree  of  implica¬ 
tion.  The  degree  of  implication,  which  is  related  to 
the  notion  of  HauadorfF  distance,  has  the  transitive 
property 

i(pk)>i(pk)®i(r|?), 

which  is  the  basis  of  the  generalised  modus  ponens 
of  Zadeh,  illustrated  in  Figure  2. 


Figure  1:  The  approximate  reMoning  problem 

model  the  fact  that  knowledge  of  propositions  that 
are  true  in  one  possible  world  does  not  provide  any 
indication  about  propositions  that  are  true  in  the 
other. 

The  similarity  relation  will  also  be  assumed  to 
be  symmetric  and  to  satisfy  a  relaxed  form  of  tran¬ 
sitivity,  intended  to  capture  the  notion  that  the 
similarity  between  two  possible  worlds  w  and  w" 
bears  some  relation  to  the  values  of  the  similari¬ 
ties  between  each  of  them  and  a  third  world  tv', 
expressed  by  the  inequality 

S(w,w")  >  $(w,w')®S(w',w"), 

where  ®  is  a  binary  operator  defined  for  pairs  of 
numbers  in  [0,1].  Imposition  of  reasonable  re¬ 
quirements  upon  the  function  ®  shows  that  it  has 
the  properties  of  a  triangular  norm  [3]. 

In  what  follows,  we  anil  also  need  a  form  of 
inverse  of  the  triangular  norm  ®,  denoted  0,  and 
defined  by  the  expression 

a06  =  sup{c:  6®c<c}. 

4  Degree  of  Implication  and  Degree  of  Consis¬ 
tence 

The  classical  rule  of  modus  ponens  may  be  thought 
of  as  expressing  the  transitive  property  of  subset 
inclusion.  Introducing  a  metric  relation  and  its  as¬ 
sociated  topology  permits  to  extend  this  relation 
by  measures  that  quantify  the  size  of  the  neigh¬ 
borhood  of  a  set  that  contains  another  set.  We 
will  say  thst  q  implies  p  to  the  degree  a  if,  for  ev¬ 
ery  9- world  w  there  exists  a  p-world  w'  such  that 
S(w,w')  >  q.  Since  it  is  true  that  S(w,w')  >  0 
for  every  pair  of  possible  worlds,  it  is  obvious  that 
any  proposition  implies  any  other  proposition  to 
some  degree. 

Informally,  the  definition  of  graded  implication 
means  that  if  p  is  stretched  to  the  degree  o,  then 
this  stretched  set  will  include  q.  The  upper  bound 


Figure  2:  Tbe  generalised  modus  ponens 

A  notion  that  is  dual  to  that  of  the  degree  of  im¬ 
plication  is  the  degree  of  consistence,  which  quan¬ 
tifies  the  amount  by  which  a  set  must  be  stretched 
in  order  to  intersect  another  set, 

C(p  |  q)  *  nip  sug  S(u>,  w) . 

Obviously, 

I(p|?)  <  C(p|f). 

5  Possibility  and  Necessity  Distributions 

An  unconditioned  necessity  distribution  for  if  is 
any  function  Nec(-)  mapping  propositions  (i.e., 
subset  of  possible  worlds)  into  numbers  between  0 
and  1,  such  that 

Nec(p)  <  I(p|tf), 

i.e.,  a  lower  bound  of  the  degree  of  implication  of  p 
by  if.  Correspondingly,  an  unconditioned  possibil¬ 
ity  distribution  is  an  upper  bound  for  the  degree 
of  consistence  of  p  and  if,  i.e., 

F»oss(p)>  C(p[P). 

Unconditioned  necessity  and  possibility  distri¬ 
butions  measure  how  mucii  a  set  must  be  stretched 
to  enclose  or  intersect,  respectively,  the  eviden¬ 
tial  aet.  The  conditional  counterparts  of  thess 
notions  characterize  the  proximity  relations  that 
exist  between  evidential  worlds  and  worlds  satis¬ 
fying  a  consequent  proportion  q  as  a  proportion 
of  the  similarity  that  exists  between  those  eviden- 
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ti&l  worlds  and  worlds  that  satisfy  the  antecedent 
proposition  p. 

A  function  Nec(-|-)  is  called  a  conditional  ne¬ 
cessity  distribution  for  if  if 

Nec($ jp)  <  [l(q |  w)  0 1(p j  w) ] . 

Correspondingly,  a  function  Pom{-|<)  is  called  a 
conditions}  possibility  distribution  for  it  if 

Poss(?|p)  >  su£  [I(?  |u>)0l(p|ti>)]. 

6  The  Generalized  Modus  Ponens 

The  usual  statement  of  the  compositions}  rule  of 
inference  or  generalized  modus  ponens  of  of  Zadeh  [5] 
is  made  in  terms  of  a  relationship  between  uncon¬ 
ditioned  and  conditioned  distributions  rather  than 
in  its  simpler  form,  given  ubove,  as  the  transitive 
property  of  the  degree  of  implication. 

The  generalized  modus  ponens  is  a  sound  logi¬ 
cal  extrapolation  procedure  that  uses  information 
about  the  metric  relations  that  hold  between  dif¬ 
ferent  subsets.  On  the  basis  of  information  about 
the  similarity  between  evidential  worlds  and  a  set 
of  possible  worlds  (i.e.,  the  antecedent  proposi¬ 
tion  p),  and  of  knowledge  about  the  relative  prox¬ 
imity  of  p-worlds  and  9-worlds  (i.e.,  conditional 
distributions),  the  generalized  modus  ponens  pro¬ 
duces  bounds  for  the  similarity  between  eviden¬ 
tial  worlds  and  those  that  satisfy  the  consequent 
proposition  q  (i.e.,  unconditioned  distributions  for 
the  consequent). 

The  actual  formal  statement  of  the  generalized 
modus  ponens  makes  use  of  the  notion  of  partition 
of  the  universe  of  discourse.  A  partition  9  simply 
corresponds  to  an  ordinary  partition  of  of  the  uni¬ 
verse  of  discourse  into  disjoint  subsets,  or,  equiva¬ 
lently  to  a  collection  of  mutually  disjoint  proposi¬ 
tions  such  that  their  disjunction  is  always  true. 

Using  this  concept,  the  generalized  modus  po¬ 
nens  for  possibility  distributions  may  be  stated  as 
follows  in  terms  of  distributions  defined  using  sim¬ 
ilarity  structures: 

Theorem:  Let  9  be  a  partition  and  let  q  be  a 
proposition.  If  Po §s(p)  and  Po«s(g|p)  are  real 
values,  defined  for  every  proposition  p  in  .**,  such 
that 

Pott(p)  >  C(plV), 

Pou(q\p)  >  sug(I(9|u>)0l(pju>)], 

then  the  following  inequality  is  valid 


s^p  [Pom(?|p)®Pom(p)J  >  C(q\if). 

A  dual  result  holds  for  necessity  distributions. 

7  Conclusion 

Similarity  models  provide  useful  interpretations  for 
the  basic  concepts  of  possibilistic  logic  using  a  more 
primitive  notion  than  that  of  possibility.  In  addi¬ 
tion  to  clearly  showing  that  fuzzy  logic  structures 
are  not  related  to  probabilistic  notions,  the  result¬ 
ing  framework  provides  a  solid  basis  for  the  study 
and  extension  of  possibilistic  logic  in  a  number  of 
directions  of  considerable  practical  importance. 
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1  Introduction 

If  artificially  intelligent  systems  are  to  produce  ad¬ 
equate  assessments  of  the  state  and  behavior  of 
the  real  world,  they  must  cope  with  information 
and  knowledge  that  is  characterized  by  varying 
degrees  of  uncertainty,  ignorance,  and  correctness. 
To  address  this  need,  we  have  developed  a  tech¬ 
nology  called  evidential  reasoning.  It  is  formally 
based  upon  the  Dempster-Shafer  theory  of  belief 
functions;  it  has  been  implemented  as  a  domain- 
independent  automated  reasoning  system;  it  has 
been  successfully  applied  to  a  range  of  real-world 
problems  [2].  Yet,  its  reliance  on  belief  functions 
has  drawn  criticism. 

Our  choice  of  an  approach  based  on  the  Dempster- 
Shafer  theory  was  not  arbitrary.  We  believe  that 
it  has  important  methodological  advantages  such 
as  its  ability  to  represent  ignorance  in  a  direct 
and  straightforward  fashion,  its  consistency  with 
classical  probability  theory,  its  compatibility  with 
Boolean  logic,  and  its  manageable  computational 
complexity.  At  the  same  time,  we  recognize  that 
other  approaches  may  also  complement  and  aug¬ 
ment  the  assessments  provided  by  evidential  rea¬ 
soning. 

We  will  examine,  within  the  limited  scope  pro¬ 
vided  by  the  format  of  this  paper,  several  criti¬ 
cisms  of  belief  functions  that  have  appeared  in  the 
literature.  We  plan,  however,  a  more  thorough  dis¬ 
cussion  of  these  criticisms  in  a  related  volume  to 
be  published  in  connection  with  this  conference. 

We  discuss  first  the  fundamental  theoretical  bases 
supporting  the  belief-function  approach  and  jus¬ 
tify  its  use  in  terms  of  the  requirements  imposed 
by  ignorance  of  certain  probability  distributions. 
We  consider  the  nature  of  Dempster’s  rule  of  com¬ 
bination  and  argue  that  negative  assessments  ei¬ 


ther  misinterpret  the  nature  of  the  distributions 
being  combined  or  ignore  the  basic  independence 
assumptions  that  assure  its  validity. 

We  answer  also  to  critiques  based  on  the  com¬ 
putational  complexity  of  the  belief-function  ap¬ 
proach.  Such  criticisms  claim  that  the  complexity 
of  probabilistic  knowledge  representations  grows 
exponentially  with  the  size  of  the  frame,  thus  mak¬ 
ing  the  theory  unsuited  for  automated  reasoning. 
Other  comments  addressed  in  our  presentation  cen¬ 
ter  on  limitations  on  the  representational  ability 
of  belief  functions  and  the  lack  of  certain  method¬ 
ological  capabilities  (e.g.,  decision-making  mecha¬ 
nisms). 

Despite  the  criticism  that  belief  functions  have 
drawn,  we  believe  evidential  reasoning  to  be  well- 
founded  and  to  have  practical  utility  in  a  broad 
range  of  applications. 

2  On  Theoretical  Soundness 

The  theory  of  belief  functions  was  originated  by 
Dempster  [1]  in  the  context  of  statistical  research. 
The  use  of  the  term  “belief,"  together  with  its 
subjectivist  connotations,  is  due  to  Shafer  [7],  who 
first  applied  the  theory  to  the  analysis  of  the  infor¬ 
mation  contained  in  imprecise  and  uncertain  evi¬ 
dence. 

Although  much  skepticism  has  been  voiced  about 
the  naturality  of  belief  functions  and  their  agree¬ 
ment  with  conventional  probabilistic  approaches, 
its  theoretical  bases  are  provided  by  a  simple  con¬ 
sideration  about  the  role  of  evidence  as  a  basic 
information  carrier. 

In  classical  probabilistic  treatments,  it  is  as¬ 
sumed  that,  under  certain  evidential  conditions  if, 
the  value  Pr(p\ff)  of  the  likelihood  of  a  particular 
statement  p  is  known.  This  view  of  evidence,  while 
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adequate  to  represent  the  informational  conditions 
of  moat  controlled  experimental  aetupa,  fails,  how¬ 
ever,  to  adequately  model  the  effects  that  acquir¬ 
ing  similar  information  has  on  our  state  of  knowl¬ 
edge  when  the  state  of  the  world  could  not  be  so 
readily  controlled. 

In  such  circumstances,  whenever  the  evidence  9 
is  observed,  three  possible  informational  outcomes 
may  result  from  examination  of  further  informa¬ 
tion  that  later  turns  out  to  improve  our  state  of 
knowledge:  either  p  is  found  to  be  true,  ~>p  is  found 
to  be  true  (i.e.,  p  is  false),  or  such  information  is 
insufficient  to  determine  the  truth  value  of  p.  Use 
of  modal  logic  concepts,  which  are  the  bases  of  the 
formal  model  of  Ruspini  [6],  suggests  the  use  of  the 
notation  Kp,  K->p,  and  Ip  to  identify  these  out¬ 
comes.  Since  these  alternatives  are  exclusive,  it  is 
clear  that 

Pr(Kp)  +  Pr(K-'p)  +  Pr(I p)  =  1 . 

As  shown  by  Ruspini,  the  function  Bel(p)  «  Pr(K p) , 
has  the  properties  of  a  belief  function,  as  axioma- 
tized  by  Shafer.  Furthermore,  since  it  is  possible 
that  Pr(I p)  >  0,  then, in  general,  it  is  Bel(p)  + 
Bel(~>p)  <  1.  This  inequality  follows  naturally, 
therefore,  from  classical  probability  theory,  applied 
here  to  considerations  about  the  provability  of  cer¬ 
tain  propositions,  as  called  by  Pearl  [4]. 

Similar  considerations  about  the  informational 
effect  of  independent  bodies  of  evidence,  which 
are  beyond  the  scope  of  this  short  summary,  indi¬ 
cate  that  Dempster's  combination  formula  is,  un¬ 
der  its  stated  assumptions,  completely  consistent 
with  conventional  probability  calculus. 

This  interpretation  quickly  disposes  of  erroneous 
arguments  based  on  unintended  interpretations  of 
the  intervals  defined  by  belief  functions.  Each  such 
interval  represents  ignorance  of  a  single  probabil¬ 
ity  value  for  a  fixed  proposition  p  under  fixed  evi¬ 
dential  conditions  9.  If  critics  choose  to  interpret 
such  intervals  as  the  possible  values  that  condi¬ 
tional  probalilities  might  attain  when  further  ev¬ 
idence  is  collected,  as  suggested  by  Pearl  [3],  be¬ 
lief  functions  will  not,  indeed,  behave  according  to 
such  unintended  semantics. 

3  On  Decision  Support 

A  criticism  of  a  more  fundamental  nature,  how¬ 
ever,  is  often  raised  regarding  the  epistemological 
need  for  the  belief-function  approach.  Summa¬ 
rized  by  statements  such  as  Pearl’s  [4]  question: 
“why  we  should  concern  ourselves  with  the  proba¬ 
bility  that  the  evidence  implies  A,  rather  than  the 
probability  that  A  is  true,  given  the  evidence?,” 
these  arguments  correctly  point  to  the  baric  knowl¬ 
edge  requirement  that  most  decision  problems  en¬ 
tail:  if  a  rational  choice  is  to  be  made,  then  we 


must  have  a  proper  informational  basis  to  do  it. 

This  obvious  consideration  is  twisted,  however, 
to  argue  for  the  necessity  to  estimate  unknown 
probability  values  when  they  are  not  available.  We 
do  not  think  that  this  modified,  or  pragmatic  ne¬ 
cessity,  argument  is  either  sound  or  compelling. 
To  answer  Pearl's  question,  we  concern  ourselves 
with  the  probability  of  provability  because  that  is 
all  that  our  data  and  the  laws  of  logic  can  pro¬ 
vide.  We  would  rather  measure  the  probabilities 
of  truth,  and  endeavor  to  do  so  whenever  possible, 
but  we  do  not  think,  however,  that  probabilities 
should  be  guessed,  simply  because  we  are  com¬ 
pelled  to  choose  a  course  of  action,  anymore  than 
any  other  unknown  physical  parameter  value. 

In  our  view,  the  belief-function  approach  may 
be  used  in  a  straightforward  fashion  to  produce 
intervals  of  possible  utility  values.  When  such  in¬ 
tervals  overlap  and  cannot  be  ordered,  this  fact 
simply  reflects  a  baric  deficiency  in  our  knowl¬ 
edge.  We  look  down  upon  “pragmatic  justifica¬ 
tions”  with  the  same  concern  that  any  experimen¬ 
tal  scientist  shows  about  proposals  to  guess  what 
he  has  not  measured:  the  ability  to  make  deci¬ 
sions  in  the  absence  of  knowledge  is,  in  our  view, 
a  handicap  rather  than  an  advantage  of  a  method. 

4  On  the  Dempster  Formula 

The  Dempster  formula  is,  currently,  the  princi¬ 
pal  evidence  integration  mechanism  of  the  belief- 
function  approach.  It  was  derived  in  the  context 
of  a  basic  model  of  the  effect  of  probabilistic  ev¬ 
idence  that  correctly  interprets  such  evidence  as 
constraints  on  probability  values  rather  than  as 
the  source  of  the  actual  values,  which  are  typically 
undetermined. 

The  formula  may  be  described  as  an  expression 
that  yields  bounds  for  the  conditional  probabil¬ 
ity  distribution  Pr(*|ITj,  Jfj)  on  the  basis  of  similar 
bounds  for  the  probability  distributions  Pr(-|Jfi) 
and  Pr(-|lf}),  under  certain  conditions  of  indepen¬ 
dence. 

Criticisms  about  the  Dempster  formula  may  be 
broadly  characterized  as  being  the  consequence  of 
two  baric  misunderstandings  about  its  validity  and 
generality. 

First,  the  formula  is  intended  to  be  applied  only 
to  those  situations  where  its  underlying  assump¬ 
tions  are  valid.  Alleged  counterexamples  such  as 
that  of  the  “three  prisoner  problem,”  referenced  by 
Pearl  [4],  fail  to  satisfy  such  assumptions  and  can¬ 
not  be  correctly  said  to  be  theoretical  failures.  We 
agree  with  Pearl,  however,  in  its  criticism  of  the 
use  of  the  Dempster  formula  to  produce  a  condi¬ 
tioning  formula,  leading  to  counterintuitive  results 
(the  “spoiled  sandwich”  effect),  which  we  consider 
also  to  reflect  failure  of  the  basic  independence  as- 
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sumptions.  We  are  endeavoring,  however,  to  ex¬ 
tend  the  original  theory  to  produce  expressions 
to  produce  and  utilize  conditional  belief  informa¬ 
tion  (5]. 

The  second  type  of  criticisms  are  baaed  on  the 
erroneous  assumption  that  the  two  evidential  bod¬ 
ies  being  combined  should  be  interpreted  as  bounds, 
provided  by  two  independent  "experts,”  which  con¬ 
strain  the  values  of  the  same  probability  distribu¬ 
tion.  As  it  was  pointed  out  before,  the  formula 
combines  two  different  conditional  probability  dis¬ 
tributions. 

5  On  Generality  and  Complexity 

The  lack  of  generality  of  the  belief-function  ap¬ 
proach  to  represent  interval  constraints  on  a  fam¬ 
ily  of  probability  distributions  is  well  known.  Our 
reliance  on  the  belief-function  approach,  in  spite 
of  such  lack  of  generality,  is  baaed  on  two  major 
considerations. 

First,  our  experience  shows  that,  notwithstand¬ 
ing  criticisms  baaed  on  unrealistic  worst-case  sce¬ 
narios,  the  approach  is  computationally  efficient. 
In  particular,  we  have  found  that  representation 
of  belief  functions  in  terms  of  basic  probabilistic 
assignments  results  in  a  storage  and  manipulation 
scheme  that  is  both  economical  and  easy  to  un¬ 
derstand.  In  addition,  we  have  sucessfully  imple¬ 
mented  tools,  such  as  summarization  and  coarsen¬ 
ing  operators,  which  may  be  effectively  utilized  to 
limit  representational  complexity. 

Second,  our  current  functional  operators  have 
been  chosen  to  guarantee  that  probabilistic  infor¬ 
mation  will  always  be  capable  of  being  represented 
within  the  scope  of  the  approach,  as  more  general 
constraints  do  not  either  enter  into  consideration 
or  appear  as  the  result  of  any  of  its  functions. 

Our  current  concerns  with  the  manipulation  of 
conditional  and  dependent  evidence  show,  how¬ 
ever,  that,  for  some  important  problems,  the  re¬ 
sults  of  evidential  combination  fall  outside  the  scope 
of  its  representational  capabilities.  Although  more 
general  schemes,  such  as  interval  probabilities,  do 
not  suffer  from  this  limitation,  their  inherent  com¬ 
plexity  precludes  their  practical  application. 

Ongoing  research  indicates,  on  the  other  hand, 
that  the  belief-function  approach  may  be  used  to 
approximate  the  results  of  these  general  evidential 
combination  operations  This  research  also  shows 
the  basic  errors  inherent  in  criticisms  that  regard 
the  belief-function  approach  as  a  fully  developed 


methodology  incapable  of  sustaining  further  en¬ 
hancement  and  modification.  Having  been  studied 
in  depth  for  only  fifteen  years,  its  technological  sta¬ 
tus  is  that  of  a  young  discipline  being  capable  of 
enhancement  on  its  own  and  of  combination  with 
other  approaches  to  produce  more  general  tools  for 
probabilistic  reasoning.  Fax  from  proving  that  we 
have  reached  a  technological  plateau,  our  investi¬ 
gations  indicate  that  much  is  yet  to  be  gained  from 
such  a  development  and  integration  process. 
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Abstract 

We  address  recent  criticisms  of  evidential  reasoning:  an  approach  to  the  analysis  of  imprecise 
and  uncertain  information  that  is  based  on  the  Dempster- Shafer  calculus  of  evidence. 

We  show  that  evidential  reasoning  can  be  interpreted  in  terms  of  classical  probability 
theory  and  that  the  Dempster- Shafer  calculus  of  evidence  may  be  considered  to  be  a  form 
of  generalized  probabilistic  reasoning  based  on  the  representation  of  probabilistic  ignorance 
by  intervals  of  po;  sible  values.  In  particular,  we  emphasize  that  it  is  not  necessary  to  resort 
to  nonprobabilistic  or  subjectivist  explanations  to  justify  the  validity  of  the  approach. 

We  answer  to  conceptual  criticisms  of  evidential  reasoning  primarily  on  the  basis  of 
their  confusion  between  the  current  state  of  development  of  the  theory  —  mainly  theoretical 
limitations  in  the  treatment  of  conditional  information —  with  its  potential  usefulness  to  treat 
a  wide  variety  of  uncertainty-analysis  problems.  Similarly,  we  indicate  that  the  supposed 
lack  of  decision-support  schemes  of  generalized  probability  approaches  is  not  a  theoretical 
handicap  but,  rather,  an  indication  of  basic  informational  shortcomings  that  is  a  desirable 
asset  of  any  formal  approximate  reasoning  approach.  We  also  point  to  potential  shortcomings 
of  the  underlying  representation  scheme  to  treat  general  probabilistic  reasoning  problems. 

We  consider  also  methodological  criticisms  of  the  approach  focusing  primarily  on  the 
alleged  counterintuitive  nature  of  Dempster’s  combination  formula  showing  that  such  iesulis 
are  the  result  of  its  misapplication.  We  address  also  issues  of  complexity  and  validity  of 
scope  of  the  calculus  of  evidence. 


1  Introduction 


If  artificially  intelligent  systems  are  to  produce  adequate  assessments  of  the  state  and  behav¬ 
ior  of  the  real  world,  they  must  cope  with  information  and  knowledge  that  is  characterized 
by  varying  degrees  of  uncertainty,  ignorance,  and  correctness.  To  .address  this  need,  we  have 
developed  a  technology  called  evidential  reasoning.  It  is  formally  based  upon  the  Dempster- 
Shafer  theory  of  belief  functions;  it  has  been  implemented  as  a  domain-independent  au¬ 
tomated  reasoning  system;  it  has  been  successfully  applied  to  a  range  of  read- world  prob¬ 
lems  [11].  Yet,  its  reliance  on  belief  functions  has  drawn  criticism. 

Our  choice  of  an  approach  based  on  the  Dempster-Shafer  theory  was  not  arbitrary.  We 
believe  that  it  has  important  methodological  advantages  such  as  its  ability  to  represent 
ignorance  in  a  direct  and  straightforward  fashion,  its  consistency  with  classical  probability 
theory,  its  compatibility  with  Boolean  logic,  and  its  manageable  computational  complexity. 
At  the  same  time,  we  recognize  that  other  approaches  may  also  complement  and  augment 
the  assessments  provided  by  evidential  reasoning. 

We  examine  several  criticisms  of  belief  functions  that  have  appeared  in  the  literature, 
discussing  first  the  fundamental  theoretical  bases  supporting  the  belief-function  approach  and 
justifying  its  use  in  terms  of  the  requirements  imposed  by  ignorance  of  certain  probability 
distributions.  We  consider  the  nature  of  Dempster’s  rule  of  combination  and  argue  that 
negative  assessments  either  misinterpret  the  nature  of  the  distributions  being  combined  or 
ignore  the  basic  independence  assumptions  that  assure  its  validity.  We  stress  also  that  it  is 
not  necessary  to  rely  on  explanations  that  are  either  nonprobabilistic  or  subjective  to  justify 
the  validity  of  the  Dempster-Shafer  calculus  of  evidence. 

Furthermore,  we  show  that  certain  apparently  counterintuitive  properties  of  the  approach 
(e.g.,  the  “spoiled  sandwich”  paradox)  are  the  natural  consequence  of  considering  families 
of  possible  probability  distributions  that  solve  an  approximate  reasoning  problem.  In  the 
context  of  this  discussion,  we  indicate  also  the  inherent  pitfalls  of  “axiomatic”  approaches 
that  accept  or  reject  methodologies  on  the  basis  of  their  compliance  with  allegedly  intuitive 
principles. 

We  answer  also  to  critiques  based  on  the  computational  complexity  of  the  belief-function 
approach.  Such  criticisms  claim  that  the  complexity  of  probabilistic  knowledge  representa¬ 
tions  grows  exponentially  with  the  size  of  the  frame,  thuc  making  the  theory  unsuited  for 
automated  reasoning.  Other  comments  addressed  in  our  presentation  center  on  limitations 
on  the  representational  ability  of  belief  functions  and  the  lack  of  certain  methodological 
capabilities  (e.g.,  decision-making  mechanisms). 

Despite  the  criticism  that  belief  functions  have  drawn,  we  believe  that  evidential  reasoning 
is  well-founded  and  that  it  may  be  effectively  applied  to  the  solution  of  a  broad  range  of 
important  practical  problems. 

Most  of  our  comments  will  be  made  in  direct  reply  to  a  recent  criticism  of  the  belief- 
function  approach  by  Pearl  [15]  since  we  feel  that  his  paper  encompasses  most  of  the  major 


worries  and  concerns  with  the  calculus  of  evidence.  While  mcst  of  the  discussion  of  this 
paper  consists  of  direct  responses  to  issues  raised  by  Pearl  and  others,  our  overall  objective 
is  considerably  broader.  Our  answers  are  motivated  by  the  same  remarks  of  DeCJroot,  quoted 
by  Pearl  at  the  conclusion  of  his  work,  about  the  need  to  use  our  methodological  approaches 
“. . .  with  the  utmost  care  and  in  accordance  with  the  highest  ethical  standards.”  Our  aim, 
like  Pearl’s,  is  to  enlighten  and  clarify,  through  careful  discussion  of  rather  subtle  and  delicate 
issues,  rather  than  to  engage  in  dogmatic  defense  of  one  approach  to  the  detriment  of  another. 
It  is  our  earnest  hope  that  this  work,  in  conjunction  with  other  evaluations  of  the  belief- 
function  approach,  will  help  to  understand  its  bases,  capabilities,  and  limitations. 

2  On  Theoretical  Soundness 

The  theory  of  belief  functions  was  originated  by  Dempster  [4]  in  the  context  of  statistical 
research.  The  use  of  the  term  “belief."  together  with  its  subjectivist  connotations,  is  due  to 
Shafer  [18],  who  first  applied  the  theory  to  the  analysis  of  imprecise  and  uncertain  evidence. 

Although  much  skepticism  has  been  voiced  about  the  naturality  of  belief  functions  and 
their  agreement  with  conventional  probabilistic  approaches,  its  theoretical  bases  are  provided 
by  a  simple  consideration  about  the  role  of  evidence  as  a  basic  information  carrier. 

In  classical  probabilistic  treatments,  it  is  assumed  that,  under  certain  evidential  con¬ 
ditions  ff}  the  value  Pr(p|fr)  of  the  likelihood  of  a  particular  statement  p  is  known.  This 
view  of  evidence,  while  adequate  to  represent  the  informational  conditions  of  most  controlled 
experimental  setups,  fails,  however,  to  adequately  model  the  effects  that  acquiring  similar 
information  has  on  our  state  of  knowledge  when  the  state  cf  the  world  can  not  be  so  readily 
manipulated. 

In  such  circumstances,  whenever  the  evidence  if  is  observed,  three  possible  informational 
outcomes  may  result  from  examination  of  further  information  that  later  turns  out  to  improve 
our  state  of  knowledge:  either  p  is  found  to  be  true.  ->p  is  found  to  be  true  (i.e.,  p  is  false), 
or  such  information  is  insufficient  to  determine  the  truth  value  of  p.  Use  of  modal  logic 
concepts,  which  are  the  bases  of  the  formal  model  of  Ruspini[17],  suggests  the  use  of  the 
notation  Kp,  K->p,  and  Ip  to  identify  these  outcomes.  Since  these  alternatives  are  exclusive, 
it  is  clear  that 

Pr(Kp)  +  Pr(K  ->p)  +  Pr(Ip)  =  1 . 

Furthermore,  since  the  probability  of  Ip  may  be  positive,  it  will  be  true,  in  general,  that 

Pr(Kp)  +  Pr(K ~»p)  <  1  . 

This  model,  based  on  a  combination  of  classical  probability  methods  and  the  modal  logic 
S5  [8,12],  essentially  provides — through  the  logical  notion  of  possible  world —  a  meaning 

throughout  this  paper,  the  symbol  if  is  used  to  denote  available  evidence,  i.e.,  a  collection  of  propo¬ 
sitions  about  the  real  world  that  are  known  to  be  true  either  as  the  result  of  direct  observation  or  as  the 
consequences  of  applicable  background  knowledge. 
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for  the  unary  operator  K  as  the  representation  of  the  state  of  knowledge  of  a  statistician 
that  is  estimating  the  probability  of  truth  of  diverse  propositions  {p,  q, . . .}  under  evidential 
conditions  if. 

This  statistician  estimates  those  distributions  by  considering  multiple  samples  of  the 
state  or  behavior  of  a  real-world  system.  Using,  for  each  sample,  additional  information 
collected  through  further  experimentation,  the  statistician  may  then  establish  or  not  the 
validity  of  a  proposition  p.  If  he  is  rather  lucky,  our  statistician  will  find  himself  in  the  ideal 
situation  where  he  can  actually  “know”2  or  “prove”  that  the  real  world  is  in  a  state  s  that 
is  described  to  the  best  level  of  detail  that  is  necessary  to  understand  its  behavior  (i.e.,  a 
“possible  world”).  This  is  the  state  of  knowledge  usually  attained,  under  perfect  laboratory 
conditions,  when  experimental  samples  are  fully  analyzed  and  when  the  outcome  of  such 
analyses  is  classified  in  terms  of  a  set  of  exhaustive  and  mutually  exclusive  alternatives. 

Under  less  desirable  epistemological  circumstances,  however,  the  statistician  will  only  be 
able  to  prove  that  a  less  specific  proposition  p  is  true.  In  the  extreme  case  where  no  further 
information  exists,  he  will  be  forced  to  say  that  his  knowledge  is  limited  to  that  provided  by 
the  evidence  if,  oi  that  it  is  “vacuous.” 

All  samples  so  analyzed,  however,  can  be  classified  as  to  the  “most  specific  knowledge” 
that  could  be  determined  in  each  case.  The  corresponding  probability  measure  of  the  set 
e(p)  of  samples  where  the  proposition  p  was  the  most  specific  knowledge  (caled  an  t piste  inic 
set  by  Ruspini)  corresponds,  in  Shafer's  framework,  to  the  value  w(p)  of  a  mass  function  m, 
i.e., 

m{p)  =  Pr(e(p)) . 

Correspondingly,  the  probability  that  p  was  “known”  to  be  true  during  statistical  experi¬ 
mentation,  corresponds  to  the  value  Bel(p)  of  Shafer's  belief  function,  i.e., 

Bel  (p)  =  Pr(K  p) . 

The  connection  between  the  ability  of  our  statistician  to  know  that  p  was  true  and  the 
belief  and  mass  functions  that  he  estimates  through  experimentation  justifies  both  tthe 
expression  epistemic  pwbubility  introduced  by  Ruspini  [17]  to  describe  the  underlying  prob¬ 
abilities  defined  over  a  particular  set  of  situations  or  scenarios  Kp,  (called  the  episttmic 
universe),  and  their  description  as  being  “probabilities  of  provability”  or  “probabilities  of 
necessity”  by  Pearl  [14],  following  a  suggestion  by  Fagin  and  Halpern[6). 

In  short,  all  such  interpretations  are  equivalent  to  the  original  model  of  Ruspini,  where 
a  rational  agent  was  able  to  prove  the  truth  of  different  propositions  under  different  infor¬ 
mational  circumstances  that  were  found  to  prevail,  during  his  statistical  experiment,  with 

2Note  that,  in  the  context  of  epistemic  logics  such  as  S5,  the  operator  K  behaves  as  a  logical  necessity 
operator.  “Knowing"  a  proposition  simply  means  that  observations  logically  imply  such  proposition,  or  that 
it  is  necessarily  true. 
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different  frequencies  of  occurrence.  Note,  however,  that  while  use  of  the  terms  “knowabil- 
ity,”  “provability,”  and  “necessity”  does  much  to  provide  adequate  semantics  to  the  calculus 
of  evidence,  its  loose  usage  leads  to  unnecessary  confusion.  For  example,  in  his  recent 
criticism  [15],  Pearl  takes  some  questionable  semantic  license  with  the  term  “necessity”  men¬ 
tioning,  for  example,  the  probability  that  a  decision  “will  have  to  made  out  of  compelling 
necessity.”  Such  “pragmatic”  necessity  does  not  have  anything  to  do,  of  course,  with  the 
“logical  necessity”  that  underlies  the  Dempster* Shafer  theory,  i.e.,  the  necessary  truth  of  a 
proposition  given  available  evidence. 

Since  the  ability  to  prove  a  proposition  q  entails  the  ability  to  prove  any  proposition  p 
that  is  implied  by  q ,  it  should  be  clear  that 

Bel(p)  =  Y,  m(?)* 

i.e.,  the  fundamental  equation  relating  the  basic  structures  of  the  calculus  of  evidence.  It  is 
also  true,  as  shown  by  Ruspini.  that 

Bel(p)  <  Pr (p)  <  1  -  Bel(-'p) . 

providing  bounds  for  the  probability  of  p  that  may  not  be  improved.  This  ability  to  manip¬ 
ulate  probability  intervals  by  means  of  the  compact  representation  scheme  of  mass  functions 
is  the  major  reason  for  the  appeal  of  the  Dempster- Shafer  methodology. 

While  the  above  discussion  clarifies  the  nature  of  the  statistician’s  knowledge  modeled 
by  belief  and  mass  functions,  doubts  might  still  remain  as  to  their  utility  to  those  that  were 
not  involved  in  their  statistical  estimation  process.  Such  usage  is,  however,  that  made  of 
any  other  probabilistic  information.  The  analyst  that  observes  if  does  not  have  the  luxury 
that  was  available  to  the  statistician  estimating  epistemic  probabilities,  i.e.,  the  ability  to 
collect  additional  information  that  permits  a  more  detailed  characterization  of  the  state  of 
the  world,  for  the  same  reasons  that  the  user  of  statistical  tables  is  unable  to  utilize  the 
raw  data  of  the  estimating  statistician.  I'lider  such  circumstances,  the  analyst  is  forced  to 
rely  on  the  probabilistic  estimates  provided  by  the  statistician,  which  are  believed  on  the 
basis  of  the  assumed  regularity  of  the  repetitive  behavior  of  the  system:  the  epistemological 
cornerstone  of  probabilistic  reasoning. 

In  other  words,  the  “probability  of  provability”  is  the  best  information  that  is  available  to 
the  analyst;  an  observation  that  not  only  disposes  of  questions  about  its  role  in  probabilistic 
reasoning,  but  also  of  Pearl’s  worries  about  its  use  in  lieu  of  the  obviously  more  desirable 
“probability  of  truth”  [15]: 

“why  we  should  concern  ourselves  with  the  probability  that  the  evidence  implies  .4. 
rather  than  the  probability  that  .4  is  true,  given  the  evidence?". 

Clearly,  we  would  prefer  having  the  latter,  but,  unfortunately,  we  can  only  measure  the 
former. 
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Our  interpretation  of  the  major  evidential  functions  and  structures  also  quickly  disposes 
of  erroneous  arguments  based  on  unintended  interpretations  of  the  intervals  defined  by  be¬ 
lief  functions.  Each  such  interval  represents  ignorance  of  a  single  probability  value  for  a 
proposition  p  under  fixed  evidential  conditions  if.  If  critics  choose,  for  example,  to  interpret 
such  intervals  as  the  possible  values  that  conditional  probabilities  might  attain  when  further 
evidence  is  collected,  as  suggested  by  Pearl  [13],  belief  functions  will  not,  indeed,  behave 
according  to  such  unintended  semantics. 

In  closing  this  section,  it  is  important  to  mention  other  alternative  views  of  the  structures 
of  the  calculus  of  evidence  such  as  that  recently  proposed  by  Smets  [19],  which  are  based  on  a 
nonprobabilistic  concept  of  belief.  Although  those  models  are  interesting  on  the  strength  of 
their  own  virtues,  we  still  emphasize  that  such  interpretations  are  not  required  to  reconcile 
the  calculus  of  evidence  with  conventional  probability  theory. 

In  consideration  of  our  ability  to  reconcile  all  structures  and  formulas  of  the  calculus  of 
evidence,  including  the  Dempster’s  formula,  with  conventional  probability  structures,  such 
as  inner  and  outer  probabilities,  we  do  not  feel  strongly  compelled  to  accept  alternative  epis- 
temic  interpretations.  Our  skepticism  in  this  regard  is  further  supported  by  the  observation 
that,  often,  such  epistemological  alternatives  are  the  result  of  misundertandings  about  the 
role  of  certain  evidential  formulas  and  processes  (e.g.,  normalization).  For  the  same  reasons, 
we  ’■email!  unconvinced  about  the  need  to  assign  several  alternative  interpretations  to  the 
structures  of  calculus  of  evidence  or  to  its  functions,  as  in  the  recently  suggestion  ~f  Halpern 
and  Fagin[7],  which  is  echoed  by  Pearl  [15]. 

3  On  Decision  Support 

A  criticism  of  a  more  fundamental  nature  of  the  calculus  of  evidence  is  often  raised  regarding 
the  output  of  generalized  interval-probability  approaches.  Since  these  methods  often  fail,  due 
to  basic  knowledge  deficiencies,  to  rank  decision  choices  by  the  value  of  some  measure  that 
quantifies  the  desirability  of  each  choice  (e.g..  expected  utility),  then  it  is  said  that  they  lack 
a  decision- theoretic  apparatus. 

Although  these  arguments  correctly  point  to  the  basic  knowledge  requirement  that  most 
decision  problems  entail— if  a  rational  choice  is  to  be  made,  then  we  must  have  a  proper 
informational  basis  to  do  it —  this  obvious  consideration  is  twisted,  however,  to  argue  for  the 
necessity  to  estimate  unknown  probability  and  utility  values  when  they  are  not  available. 
We  do  not  think  that  this  pragmatic  necessity,  argument  is  either  sound  or  compelling. 

In  our  view,  the  calculus  of  evidence  may  be  used  in  a  straightforward  fashion  to  produce 
intervals  of  possible  utility- values.  When  such  intervals  overlap  and  cannot  be  ordered,  this 
fact  simpiy  reflects  a  basic  deficiency  in  our  knowledge.  We  look  down  upon  "pragmatic 
justifications”  with  the  same  concern  that  any  experimental  scientist  must  show  about  pro¬ 
posals  to  guess  what  he  has  not  measured:  the  ability  to  make  decisions  in  the  absence  of 
knowledge  is,  in  our  view,  a  handicap  rather  than  an  advantage  of  any  method. 
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Far  from  lacking  a  decision-theoretic  methodology,  our  approach  provides  an  easily  under¬ 
standable  quantification  of  the  undesirable  effects  that  poor  information  has  on  our  decision¬ 
making  ability;  ordering  decisions  whenever  it  is  rationally  possible  but  advising  us  that  such 
ranking  is  not  possible  if  our  knowledge  is  insufficient.  In  brief,  our  approach  does  not  only 
supports  decision-making  but.  through  its  built-in  sensitivity-analysis  features,  helps  us  to 
determine  what  must  be  done  to  reach  a  happier  epistemological  state.3 

4  On  Dempster's  Rule  of  Combination 

The  semantic  model  of  the  Dempster-Shafer  theory  also  validates  the  so-called  Dempster’s 
rule  of  combination,  which  permits  the  combination  of  belief  and  mass  functions  corre¬ 
sponding  to  different  evidential  observations,  made  under  certain  conditions  of  independence. 
When  such  conditions  are  not  valid,  use  of  this  formula  leads,  of  course,  to  erroneous  results, 
often,  although  incorrectly,  considered  to  be  an  essential  handicap  of  the  evidential  reasoning 
approach,  rather  than  a  consequence  of  its  misapplication. 

The  Dempster  formula  is.  currently,  the  principal  evidence  integration  mechanism  of  the 
belief-function  approach.  It  was  derived  in  the  context  of  a  basic  model  of  the  effect  of  prob¬ 
abilistic  evidence  that  correctly  interprets  such  evidence  as  constraints  on  probability  values 
rather  than  as  the  source  of  the  actual  values,  which  are  typically  undetermined,  it  may  be 
described  as  an  expression  that,  under  certain  conditions  of  independence,  yields  bounds  for 
the  conditional  probability  distribution  Pr( -|y i .  if 2 )  on  the  basis  of  similar  bounds  for  the 
probability  distributions  Pr(-|y,)  and  Pr(-|$f2). 

To  understand  the  conceptual  bases  for  the  Dempster's  formula  of  combination  and  its 
consistence  with  conventional  probability,  we  resort  to  a  generalization  of  the  logical  model 
used  before  to  derive  the  basic  relations  of  the  calculus  of  evidence.  Instead  of  considering  a 
single  epistemic  operator,  corresponding  to  a  single  statistician  or  observer,  we  will  consider 
two  such  rational  agents,  with  their  knowledge  modeled  by  means  of  two  operators  Kj 
and  K2.  Each  of  these  rational  agents  will  be  assumed  to  be  ignorant  of  the  knowledge 
possesed  by  the  other,  i.e..  as  if  they  were  statisticians  performing  independent  experiments 
under  different  evidential  conditions  and  tf2-  Their  common  knowledge,  however,  will  be 
modeled  by  means  of  a  nonindexed  operator  K  corresponding  to  a  third  reliable  agent  that 
aggregates  the  statistical  knowledge  gathered  by  the  other  two. 

Clearly,  in  a  given  applicable  situation  (i.e..  the  first  agent  observes  and  the  second 
agent  observes  $f2),  the  integrating  agent,  who  does  not  add  any  knowledge  of  his  own,  will 
be  able  to  prove  (or  to  “know”  the  truth  of)  a  proposition  p ,  if  the  other  agents  provide 
individual  items  of  information  that,  when  combined  (i.e..  conjoined)  imply  p,  as  expressed 
by  the  basic  combination  axiom: 

3For  an  example  of  an  approach  that  incorporates  decision-maker  preferences  into  the  framework  of  the 
belief-function  calculus,  the  reader  is  referred  to  a  recent  paper  by  Strat  (21). 


Kp  is  true  if  and  only  if  there  exist  sentences  pj  and  pi  such  that  Kjpi  and  K2P2  are 
true,  and  such  that  pi  A  P2  =>  p. 

Using  our  three  operators  to  generate  all  possible  (i.e.,  logically  consistent)  states  of 
knowledge  that  may  be  attained  by  each  of  the  three  agents  while  assessing  the  state  of  a 
real  system,  we  may  say  that  each  of  them  has,  as  was  the  case  before,  a  knowledge  about 
the  real  world  that  may  be  represented  by  the  “most  specific”4  propositions  pi,p2  and  p  that 
each  has  been  able  to  prove  (with  p  being  obviously  more  specific  than  either  p\  or  p2).  In  the 
terminology  of  Ruspini’s  semantic  model,  each  of  the  agents  is  in  an  epistemic  state,  denoted 
by  e(p),  e^pi)  and  e2(p2),  respectively,  each  corresponding  to  the  set  of  all  conceivable  states 
of  the  real  world  (i.e.,  possible  worlds)  having  such  knowledge  characteristics. 

The  following  important  set-equation  relating  all  of  these  types  of  epistemic  sets  as  subsets 
of  our  enhanced  epistemic  universe,  is  the  basis  for  the  derivation  of  various  evidential 
combination  formulas 

e(p)  =  U  (ei(pi)  ne2(p2)) , 

piAp2=p 

of  which  the  Dempster  combination  formula 

m(p)-A  Y,  ”h{Pi)  rn2{p2) , 
p\*pi=p 

where 

m(p)  =  Pr(e(p)|8’1.ff2).  m,(p,)  =  Pr(e,(p,)|jT  j),  m2(p2)  =  Pr(e2(p2)|*f2) , 

and  where  A  is  a  multiplicative  factor,  is  the  best  known  and  used. 

Before  reviewing  the  actual  process  leading  to  the  derivation  of  the  Dempster’s  formula, 
it  is  important  to  pause  and  reflect  upon  the  nature  of  the  above  sel-theorelic  equation  and 
its  usefulness  to  derive  evidence  combination  formulas. 

We  may  first  note  that  this  equation  has  been  derived  as  a  relation  between  subsets  of  pos¬ 
sible  “epistemological  states"  that  is  valid  regardless  of  any  assumptions  about  probabilistic 
structures  and  their  properties  (e.g..  independence).  As  such,  it  does  not  only  provides  the 
bases  for  the  derivation  of  the  Dempster  formula  but  actually  of  a  variety  of  formulas  that 
bound  possible  probability  values  within  and  without  the  structures  of  the  Dempster-Shafer 
theory. 

Basically,  this  formula  provides  the  basis  to  extend  a  probability  function  Pr  that  is 
known  over  subsets  of  the  form  ei(pj)  and  e2(p2)  (be.,  over  two  c-algebras),  to  the  set  of 
unions  of  sets  of  the  form  ej(pi)  fle2(p2)  (i.e.,  another  <r- algebra).  If  such  extension  can  be 
made  uniquely — as  is  the  case  for  the  Dempstei  founula  the  resulting  extension  may  be 
used  to  generate  both  the  conditional  probability  Pr(-|$fi,  $f2)  and  its  associated  bounds  Bel 

4Note  that  such  most  specific  knowledge  always  exists  and  is  unique,  but  for  logical  equivalences,  since 
the  conjunction  of  ail  proved  theorems  is  itself  a  theorem. 
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and  PI,  which  are  fully  compliant  with  the  Shafer  axioms.  In  other  less  fortunate  cases  (e.g., 
dependent  evidence),  such  extension  is  not  unique  and  the  lower  envelope  of  the  possible 
extensions,  which  is  not  a  probability,  will  lead  to  bounds  that  do  not  satisfy  the  axioms  of 
the  calculus  of  evidence. 

A  most  important  remark  that  must  be  made  in  this  regard  is  that  this  equation  is  now 
being  used  to  extend  the  evidential  calculus  approach  by  generalization  of  the  notion  of  con¬ 
ditional  probability  by  study  of  the  probabilistic  relations  that  define  dependencies  between 
the  different  types  of  epistemic  sets  (i.e.,  e(p),  ei(pi)  and  e2(p2)).  Pearl  [15],  however,  be¬ 
lieves,  apparently  as  the  result  of  his  examination  of  the  role  of  compatibility  nlalions  in  the 
calculus  of  evidence,  that  this  approach  is  essentially  limited  in  its  expressive  ability  to  set- 
theoretic  relations  between  epistemic  sets,  which  correspond  to  classical  logical  conditional 
statements  (i.e.,  material  implications). 

In  fact,  it  may  be  easily  seen  from  our  epistemic  identity  that  whenever  the  conditional 
probabilities  Pr(e2(p2)|et(Pi ))  and  Pt(e)(p\)\ei(p2))  are  restricted  to  take  the  values  0  or 
l,5  then  this  identity  may  be  used  to  map  one  body  of  evidence  into  another,  i.e..  by  means 
of  the  compatibility  relations  that  such  probabilities  define. 

Since  under  these  assumptions,  however,  there  can  be  only  one  proposition  p2  for  every 
proposition  pi  such  that  Pr^^Jle^pi))  =  1.  and  viceversa.  then  the  compatibility  relation 
that  is  so  defined  may  be  characterized  by  several  implications  of  the  form 

e,(pi)  =?*  e2(Pi) . 

and  of  the  form 

eA'ii)  =>  e,(9i) . 

between  knowledge  states  of  one  observer  to  knowledge  states  of  the  other  which  are  useful 
to  “transfer  mass’'  between  propositions.  This  correspondence  must  be  contrasted  with  that 
following  from  the  limited  interpretation  given  by  Pearl  who.  from  knowledge  of 

ei(pj)  =»  e2(p2). 

concludes  (by  contraposition),  correctly  but  narrowly,  that 

■neApi)  ==>  -’ej(pi) , 

proceeding  then  to  attach  all  material  implication  paradoxes  (e.g.,  the  “ravens  paradox”)  to 
the  calculus  of  evidence  as  if  they  were  an  essential  methodological  bane.  If  that  were  to  be 
the  case — clearly  it  is  not —  the  same  concerns  should  be  raised  about  the  use  of  conditionals 
in  conventional  probability  calculus. 

The  second  observation  that  may  be  made  about  the  nature  of  evidence  combination,  in 
general,  and  the  role  of  our  basic  set  identity  to  generate  combination  formulas,  in  particular. 

5lt  may  be  shown  from  the  definition  of  epistomir  sets  that,  under  such  conditions,  knowledge  of 
Pr(e2(p2)|ei(pi))  suffices  to  derive  Pr(ei(pi)|e2(/>>)). 
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is  that  while  the  functions  to  be  combined  are  conditional  probabilities  over  two  different 
evidential  sets  ifi  and  if 2,  (i.e.,  the  evidence  observed  by  two  agents),  the  desired  integrated 
probability  is  a  distribution  over  ifj  fl if  2  (since  we  know  that  both  observations  are  correct). 
Except  for  unusual  cases,  however,  computation  of  Pr(-|ifj, if2)  entails  a  “normalization’’ 
operation  that  is  fully  consistent  with  the  calculus  of  probability.  Most  of  the  normalization 
“paradoxes”  are  the  result  of  misunderstanding  about  what  is  being  combined:  two  different 
conditional  probabilities  rather  than  two  different  lower  and  upper  bounds  of  the  same 
probability  function.6 

Focusing  now  on  the  rationale  for  Dempster’s  formula,  we  should  notice  first  that  the 
epistemic  sets  ei(pj)  and  e2(p2)  are  such  that 

ei(Pi)£fr,,  e2(p2)C*2, 


i.e.,  the  possible  knowledge  states  of  each  statistician  include  awareness  of  the  truth  of  the 
evidence  that  is  observed  by  each.  Furthermore. 


^i=Uei(Pi)-  ^z  =  Ue2(P2). 

Pl  P2 


where  p\=$ff\  and  p2=>Jf2,  i.e.,  each  statistician  knows  something  that  implies  that  his 
evidential  observation  is  true  (otherwise  he  would  not  be  “counting”  that  sample).' 

Assume  now  that  there  exists  a  probability  distribution  Pr  defined  over  the  space  of  all 
possible  epistemic  states  for  our  observing  statisticians  and  our  “integrating”  agent.  Each 
such  epistemic  state  is  a  possible  world  that  corresponds  to  a  possible  state  of  the  world  and 
to  a  possible  state  of  knowledge  for  each  agent  that,  in  addition,  is  consistent  with  the  laws 
of  logic.  We  will  assume  now  that,  whenever  p\=>if\  and  p2=»$f2,  it  is 


Pr(e,(p,)ne2(p2)) 


Pr(ej(pj))  Pr(e2(p2)) . 
0. 


if  Pi  A  p2  ^  0 . 
otherwise. 


This  assumption  simply  states  that,  when  if  1  and  if2  are  both  true  the  probability  that  a 
rational  observer  will  be  in  a  particular  knowledge,  or  epistemic  state  does  not  provide  any 
information  about  the  probability  of  the  epistemic  stale  of  the  other  agent  (i.e.,  beyond  ruling 
out  logical  impossibilities).  In  purely  formal  terms,  we  may  say  that  knowledge  of  values  of 
Pr  over  sets  of  the  form  ei(pi)  does  not  provide  any  indication,  beyond  exclusion  of  logical 
impossibilities,  of  the  values  of  Pr  over  sets  of  the  form  e2(p2)  and  viceveisa.  The  epistemic 
states  of  our  two  agents  may  be  said,  therefore,  to  be  uncorrelated  in  that  knowledge  of  the 
state  of  one  of  our  observers  (by  our  integrating  agent)  does  not  provide  any  information 
about  the  state  of  the  other,  save  for  elimination  of  logical  impossibilities, 

6It  is  fair  to  say  that  much  of  the  skepticism  raised  by  the  normalization  used  in  Dempster’s  formula  can 
be  traced  to  the  exposition  given  by  Shafer  [18].  which  suggests  excessive  reliance  on  unfounded  heuristics. 

7Recall  that  our  observers,  or  rational  agents,  are  statisticians  estimating  properties  of  certain  statistical 
distributions  by  classifying  each  sample  using  their  evidence  and  additional  sample-dependent  knowledge. 
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Noting  now  that 


P  ,  .  V,-,  v  _  Pr(ei(pi))  p  ,  ,  .  _  Pr (e2(p2)) 

Pr(ej(pi)|yi)  -  '  Pr(e2(p2)|^2)  -  Pr(^j  ' 


Pr(ej(pi)  ne2(/)2)|ir  i.fr2)  = 


Pr(ei(pi)  ne2(p2)) 
Pr(«r,  n*r2) 


then,  whenever  px  A  p2  ^  0,  it  is 


Pr(«i(Pi)  n e2(p2)|# j.  JT2)  =  APr(e, (/>,)]*, )Pr(e2(p2)|y2)  =  Am,(j>,) m2(p2), 


from  which  the  Dempster’s  formula  readily  follows. 

The  normalization  factor 

.  _  Pr(y1)Pr(y2) 

PrfSTi  n  *r2)  ' 

has  been  the  object  of  considerable  concern  by  both  skeptics  and  proponents  of  the  calculus  of 
evidence.  The  above  expression,  however,  provides  the  rationale  for  its  usage  while  disposing 
of  arguments  about  its  alleged  inconsistence  with  the  probability  calculus.  In  that  expression, 
the  denominator  Pr(#j  fl  if  2)  appears  as  the  consequence  of  the  need  to  derive  probability 
distribution  estimates  with  respect  to  the  intersection  of  the  two  observed  evidences  if  x 
and  if  2-  The  numerator  of  that  expression  simply  reflects  the  need  to  combine  conditional 
distributions  over  the  same  reference  set  (i.e..  the  epistemic  universe)  while  our  probabilistic 
knowledge  is  expressed  over  two  of  its  subsets  (i.e..  if  j  and  if 2). 

The  essence  of  the  conditions  that  lend  validity  to  the  Dempster  formula  may  be  sum¬ 
marized  by  saying  that  its  usefulne^  is  confined  to  the  limited,  but  rather  important  cases, 
where  estimates  of  probabilistic  likelihood  have  been  formulated  by  two  rational  agents  on 
the  bases  of  independent  observations,  while  ignoring  the  evidence  available  to  each  other. 

If  our  integrating  agent  is  thought  of  as  being  concerned  with  estimating  the  probabilities 
of  certain  events  when  both  if  1  and  if  2  are  true,  then  we  may  say  that,  whenever  the 
conditions  validating  the  Dempster's  formula  hold,  knowledge  of  the  fact  that  a  particular 
sample  satisfies  pi,  tells  him  nothing  about  the  likelihood  of  p2  (unless,  of  course,  px  happens 
to  be  logically  inconsistent  with  p2).  Furthermore,  whenever  our  integrating  agent  is  done 
with  his  job,  he  should  find  out  that  estimating  this  joint  distribution  (i.e.,  over  ifx  n$f2) 
could  have  been  accomplished  in  an  easier  fashion  by  estimating  the  marginal  distributions 
over  if\  and  if 2  and  deriving  the  joint  distribution  by  multiplication  and  normalization. 

Other  accounts  supporting  the  validity  of  the  Dempster's  formula  and  its  consistence  with 
the  probability  calculus  have  been  advanced  by  several  authors.  A  particularly  compelling 
justification  has  been  recently  given  by  Wilson  [22]. 


5  On  Paradoxes 

Criticisms  of  the  Dempster  formula  may  be  broadly  characterized  as  being  the  consequence 
of  basic  misunderstandings  about  either  its  meaning  or  its  validity. 


In  this  section,  we  examine  three  alleged  paradoxes  of  the  theory  showing  that  the  pur¬ 
ported  inconsistencies  are  actually  the  results  of  conceptual  misunderstandings  or  misrepre¬ 
sentations  of  the  position  of  those  who,  while  generally  supporting  the  calculus  of  evidence, 
are  concerned  with  its  possible  misapplication. 

5.1  The  Three  Prisoner  Problem 


Turning  our  attention  first  io  concerns  about  the  validity  of  the  Dempster’s  formula,  we  may 
note  that,  in  general,  such  examples  ignore  its  scope  of  applicability,  producing  counterintu¬ 
itive  results  that  are  then  used  to  dismiss  the  methodology  as  inadequate.  Among  those,  the 
“three  prisoner”  problem  discussed  by  Diaconis  and  Zabell  [5]  has  been  perhaps  the  more 
quoted  and  discussed. 

This  problem  is  one  of  a  variety  of  examples,  where  the  combination  formula  is  used  as  a 
conditioning  formula  by  assuming  that  one  of  the  mass  distributions  being  combined  simply 
assigns  all  its  mass  to  a  proposition  p  in  the  frame  of  discernment.  Combination  of  such 
simple  support  function  with  another  mass  function  associated  with  a  belief  function  Bel(  ) 
leads  to  the  conditioning  formula 


Bel(?|p)  = 


Bel(r/  V  -ip)  —  Bel(-ip) 
1  -  Bel(-«p) 


In  the  particular  case  of  the  three  prisoners  problem,  concerned  with  the  guilt  or  innocence 
of  a  prisoner  that  has  been  chosen  (by  the  Warden)  as  the  guilty  party  by  random  draw  among 
three  candidates  A\.  A2  and  .43.  our  "logical  space"  or  frame  of  discernment  is  simply  the 
Boolean  algebra  induced  by  the  three  noncompatible  propositions 


“Prisoner  .4,  has  been  found  guilty." 


where  i  =  1,2,3.  Since  only  one  of  the  three  prisoners  is  chosen  by  the  Warden,  we  clearly 
have 

Pr(Pi)  =  5 .  ?  =  1,2,3 

(Note  that  Pr  is  actually  a  conventional  probability  distribution). 

Prisoner  A\  now  asks  the  Jailer  to  name  one  of  the  innocent  prisoners  other  than  him 
arguing  that  such  information  would  clearly  be  of  little  help  to  him  as  an  indicator  of  his 
potential  fate.  As  Pearl  notes,  if  q  stands  for  the  proposition  “The  Jailer  names  A2  as  one 
of  the  innocent,”  then  application  of  the  conditioning  rule  leads  to  the  result 


Bel  (p  1 1  q)  =  PI  (pi  |  q)  =  \. 

indicating  that  the  conditional  probability  Pr(pi  |r/)  must  be  exactly  1,  instead  of  the  "correct 
solution” 

0  <  Pr  (pi  \q)  <  \  . 
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while  also  saying,  agains  the  correct  intuition  of  .4]  that  his  chances  of  guilt  have  been 
increased  as  the  result  of  the  irrelevant  information  provided  by  the  Jailer.  From  such 
an  observation,  Pearl  concludes  that  the  formula  is  seriously  flawed  both  because  of  the 
counterintuitive  result  that  it  produces  and  for  its  “collapsing”  of  a  family  of  solutions  into 
a  single  value. 

Before  proceeding  to  the  discussion  of  Pearl’s  concerns  we  may  note,  in  passing,  that  this 
problem  has  been  well  known  as  a  source  of  paradoxes  and  incorrect  solutions  within  the 
scope  of  the  conventional  probability  calculus  [2]  quite  independently  of  any  issues  of  validity 
of  its  treatment  using  the  Dempster- Shafer  calculus.  Curiously  enough,  the  explanations 
given  to  describe  the  conceptual  errors  leading  to  incorrect  classical  treatments  resemble  to 
some  extent  that  shedding  light  on  the  inapplicability  of  the  Dempster’s  formula. 

Returning  now  to  the  role  of  the  Dempster’s  formula  in  this  problem,  we  may  first  observe 
that,  although,  at  first  glance,  the  distributions  representing  the  Jailer’s  and  Warden's  choices 
seem  independent,  it  is  actually  impossible  for  the  Jailer  to  tell  to  Ai  that  A2  is  one  of  those 
to  be  spared  if  all  he  knew  was  that  the  Warden  was  choosing  to  be  the  guilty  party  by 
random  draw  (i.e..  he  needs  to  know  exactly  who  is  the  one  chosen  for  punishment).  To  use 
the  terminology  of  Ruspini’s  model,  the  probability  of  A2  being  named  as  one  of  the  innocent 
depends  on  the  epistemic  state  of  the  Warden  thus  violating  the  independence  assumptions 
of  the  Dempster’s  formula.  If  all  possible  combinations  of  truth  values  for  the  propositions 
p,.  i  =  1.2,3,  and  q  are  tabulated,  together  with  their  probabilities,  as  done  in  Table  1,  then 
it  is  clear  that 

Pr(r/|p3)=l.  Pr(V)  =  i(l+Q). 

where  0  <  a  <  1  represents  the  unknown  probability  that  the  Jailer  will  choose  to  name  A2 
rather  than  A3  as  innocent  if  A}  is  actually  the  one  chosen  by  the  Warden  as  guilt)'. 


Possible  World 

Warden’s  Choice 

Jailer  Identifies 

Probability 

Hi 

A\ 

a2 

3  ° 

Hi 

A\ 

a3 

I(l-o) 

H'3 

A  2 

A  3 

i 

3 

w4 

a3 

a2 

1 

3 

Table  1:  Possible  Worlds  in  the  Three  Prisoners  Problem 


But  then. 

Pr(<7|/;3)  ^  Pr(< /) . 


violating  the  assumptions,  discussed  above,  that  validate  the  utilization  of  the  Dempster's 
formula  (i.e.  Pr(e2(p2)|ei(pi))  ^  There  is  not,  therefore,  "total  mister)'."  as 
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Pearl  says,  as  to  the  incorrect  results  obtained  using  the  Dempster’s  formula.  Failing  to  be 
applicable,  there  should  be  little  wonder  that  it  leads  to  apparent  paradox. 

Although,  as  clearly  shown  by  this  discussion,  the  incorrect  treatment  of  the  three  pris¬ 
oner  problem  fails  to  invalidate  the  Dempster's  rule  of  combination,  we  share  the  concern  of 
Pearl  and  others  about  its  wide  misapplication,  particularly  when  used  undiscriminately  to 
generate  conditional  distributions.  In  our  research,  we  are  endeavoring  to  extend  the  original 
theory  to  produce  expressions  to  produce  and  utilize  conditional  belief  information  [16]  that 
incorporates  known  dependencies  between  evidential  bodies.  These  formulas  are  intended  to 
provide  better  interval  estimates  that  the  typically  uninformative  bounds  that  are  supplied 
by  strict  derivation  of  bounds  in  the  absence  of  additional  information  by  the  expression 

n.w-,_v _ Bel(p  A  q) 

B  (9|P)  Bel(p  Aq)  +  Pl(p  A  ->g) 

which  is  mentioned  in  Dempster’s  original  paper  [4]  and  that  has  been  the  object  of  recent 
concern  by  several  authors  [3,7]. 

In  closing,  we  feel  it  is  important  to  address  other  concerns  of  Pearl,  going  apparently 
beyond  the  three  prisoners  problem,  about  the  counterintuitive  nature  of  the  “collapse”  that 
usage  of  the  Dempster  formula  often  produces,  which  is  manifested  by  production  of  a  single 
conditional  probability  distribution  when  conditioning  multiple  members  of  a  famil>  V  of 
probabilities  over  some  specific  subset  <y.  Just  as  it  is  true  that  all  members  of  the  family  of 
distributions 

V  =  {Pr( :  t  in  [0,1]} 

defined  in  the  set  .Y  =  {a.b.c}  by  the  expression 


Pr,  (.*") 


i< 

1 

2  ' 


1-0 


if  x  —  a  . 
if  j-  =  b . 
if  x  =  c , 


are  such  that  Pr(  ({a, 6})  =  | .  despite  their  variability  over  other  subsets,  it.  is  also  true  that 
an  extensive  family  of  distributions  may  collapse  into  a  single  conditional  probability  without 
violating  any  rational  or  probabilistic  principles.  Such  “invariants”  are,  in  fact,  desirable  as 
elements  that  simplify  the  analysis  of  an  otherwise  complex  probabilistic  problem.  For 
these  reasons,  we  do  not  feel  that,  if  the  Dempster’s  conditioning  formula  is  applicable,  its 
reduction  of  the  variability  of  probability  values  should  be  a  particular  cause  for  concern  as 
to  its  validity. 


5.2  The  Spoiled  Sandwich 

While  discussing  the  suitability  of  the  calculus  of  evidence  either  as  a  orm  of  generalized 
probabilistic  calculus,  or  as  a  new  theory  that  intends  to  capture  a  novel  notion  of  belief. 
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Pearl  [15]  again  faults  the  approach  for  failing  to  satisfy  the  following  rationality  principle 
originally  stated  by  Aleilunas  [1  ]: 

“If  two  diametrical!)  opposed  assumption:,  yield  two  different  degrees  of  belief  in  a 
proposition  Q ,  then  the  unconditional  degree  of  belief  merited  by  Q  should  be  some¬ 
where  between  the  two.” 


As  natural  such  a  principle  might  look  at  first,  the  following  simple  and  clever  example  of 
Wilson  [23]  clearly  shows  that  it  is  neither  intuitive  nor  appealing  pointing,  however,  to  the 
pitfalls  of  creating  or  supporting  one’s  favorite  scheme  on  the  strength  of  supposedly  rational 
axioms. 

Let  X  =  {a,6,c,c/}  with  .4  =  [a, 6}  and  B  =  {a,c}.  so  that  B  =  {b,d}.  Consider  the 
family  of  probability  distributions  in  X 

V=  {  Pr,:  /  in  [0.1] }. 

indexed  by  a  parameter  t  in  [0. 1].  and  defined  by 


Pr«({«})  = 

Pr,  ({/>})  =  JU-0. 

Pr.(W)  =  }• 

Pr,  ((</})  =  }, 

and  let 

Pr.  =  inf{Pr,}  . 

Then,  clearly 

Pr,(A)  =  if  +  i(l  -0  =  i. 

and,  therefore,  it  is  Pr.  (.4)  =  The  conditional  probabilities  Pr,  (A\B)  and  Pr,  {A\B)  are 
given  by  the  expressions 


Pr,  (A\B) 
Pr,(.4|£) 

from  which  the  lower  bounds 


Pr»  ({«))  _  2  1 

Pr,  (R<})  i  + 

Pr,  ( {(>})  =  i(l-f) 

Pr,  ({*.</})  i  +  }(1  -/)• 


Pr.(.4|B)  =  inf  Pr,(.4|£)  =  0. 

Pr.  (A\B)  =  inf  Pr,(.4|£)  =  0, 
are  easily  derived.  It  is  clear,  however,  that 

\  =  Pr.  (.4)  >  Pr.  {.4|20  =  Pr.  (A\B)  =  0. 

showing  that  the  the  sandwich  pri;  ri pie  is  violated  even  within  the  confines  of  conventional 
probability  theory. 
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5.3  Other  ways  to  spoil  the  sandwich 

Although  such  simple  examples  should  suffice  to  dispose  of  concerns  about  spoiled  sand¬ 
wiches,  we  feel  that  Pearl's  discussion  of  the  problem  deserves  a  more  detailed  analysis, 
mainly  because  of  its  philosophical  implications  to  rational  thinking.  This  is  particularly 
important  as  loose  use  of  such  terms  as  "assured  winnings,”  "support,”  or  "belief”  in  the 
absence  of  a  sound  formal  interpretive  framework  may  quickly  mislead  those  engaged  in  the 
comparison  of  alternative  methodologies. 

In  an  example,  called  “the  Peter,  Paul,  and  Mary  Sandwich  piobleu  Pearl  presents  a 
betting  situation  where  Mary  prepares  either  a  ham  or  a  turkey  sandwich  promising  to  pay 
Paul  $1000  should  he  guess  correctly  the  type  of  sandwich  that  she  has  prepared.  Not  having 
a  clue  as  to  Mary’s  choice.  Paul  then  flips  a  coin  guessing  “ham”  if  the  coin  turns  up  heads 
and  guessing  “turkey”  if  it  comes  up  tails.  Paul,  as  Pearl  notes,  behaves  like  an  "incurable 
Bayesian,”  reckoning  that 

Pr(win)  =  Pr(win  j  turkey)  Pr( turkey)  +  Pr(win  |  ham)  Pr(ham) 

=  Pr( tails  |  turkey)  a  -f  Pr(heads  |  ham)  (1  —  o)  =  | , 

regardless  of  the  value  a  of  the  probability  that  Mary  has  actually  prepared  a  turkey  sand¬ 
wich.  Thus,  in  spite  of  not  being  "assured"  a  win.  or  having  "supporting  evidence,”  Paul  can 
invoke  the  rationality  (doubtful,  as  we  already  saw)  of  the  sandwich  principle  and  argue  that 
Paul  does  not  need  to  engage  in  unnecessary  ki  owledge  acquisition  or  experimentation  [15): 

"If  every  possible  outcome  of  an  experiment  would  lead  you  lo  choose  the  same  action, 
then  you  ought  to  choose  that  action  w  ithout  running  the  experiment.” 

From  such  an  observation.  Pearl  proceeds  lo  fault  the  philosophical  underpinnings  of  the 
belief-function  approach  eventually  going  as  far  as  to  suggest  that,  should  Bayesian  ortho¬ 
doxy  be  unapplicable,  the  Dempster's  formula — which,  he  freely  admits,  does  not  play  any 
role  in  this  example —  be  replaced  by  other  formulas  such  as  the  well-known  bounds  recently 
rediscovered  by  Halpern  and  Fagin  [7]. 

In  the  light  of  our  previous  example  about  the  rather  inconvenient  ability  of  conventional 
probability  families  to  spoil  sandwiches,  all  of  these  pronouncements  look  increasingly  sus¬ 
picious:  What,  however,  may  we  say  that  it  is  wrong?  This  question  may  be  answered  in 
two  equivalent  ways. 

We  may  say  first,  keeping  ourselves  at  the  informal  discussion  level,  that,  often,  the 
experiments  may  interact  with  probabilities  in  complex  ways  that,  obviously,  Pearl  has  not 
considered.  Nothing  in  Pearl's  formalism  suggests,  for  example,  that  the  sandwich  has 
already  been  prepared  and  that  i;  may  not  be  artfully  substituted  by  Mary  to  assure  that 
Paul  always  loses  thus  invalidating  his  hopes  of  having  at  least  a  50^  chance  of  winning. 

The  second,  more  formal,  rendering  of  this  observation  is  again  based  on  the  semantic 
model  of  Ruspini.  In  this,  and  in  other  similar  problems,  we  have  several  agents  that  de¬ 
liberate  about  the  state  of  the  world  on  the  basis  of  their  knowledge  and  knowledge  of  the 


16 


knowledge  of  others.  If  the  unary  operator  K  represents  the  slate  of  knowledge  of  one  of 
these  agents,  then,  as  observed  before,  our  agent  is  usually  in  one  of  three  possible  epistemo¬ 
logical  states  with  respect  to  the  validity  of  a  proposition  p:  either  he  knows  that  p  is  true 
(denoted  Kp),  or  he  knows  that  p  is  false  (denoted  KSp).  or  he  may  be  ignorant  of  such 
truth  (i.e.,  -»Kp  A  -*K-« p,  denoted  Ig). 

In  standard  accounts,  assuming  that  knowledge  of  the  truth  of  does  not  affect  like- 
hilood  of  truth  of  other  propositions,  we  are  simply  concerned  with  a  single  form  of  condi¬ 
tional  probability:  that  measuring  the  likelihood  of  p  being  true  when  g  is  true.  In  more 
complex  epistemological  situations,  we  may  need  to  be  concerned  with  such  quantities  as 
Pr(Kp  |  Kg),  Pr(Kp  |  q).  Pr(Kp  1 1 g).  and  the  like.  In  other  words.  Bel(p  |  g)  measures  the 
support  that  knowledge  of  the  truth  of  q  provides  to  the  truth  of  p.  rather  than  the  support 
provided  by  the  truth  of  g  to  the  truth  of  p. 

In  the  Peter,  Paul,  and  Mary  sandwich  problem.  Pearl  implicitly  assumes  that 

Pr(KfuRYheads)  =  0, 

Pr(KnARYt  «‘ks)  =  0. 

Pr(  turkey  |  iMARylieads)  =  a. 

Pr(liain  |  iHARyheads)  =  l-o. 


concluding  correct)}.  b\  application  of  the  total  probability  law.  over  the  exhaustive  and 
exclusive  set  of  possibilities 


{KmRylieads.  KMARytails.  InARyheads}. 

that  Paul  has  at  least  a  50l/<  chance  of  winning. 

This  conect  use  of  the  total  probability  law  does  not  mean  that,  by  contrast,  one  should 
assume  that  the  full  extent  of  the  conditional  information  provided  by  belief  functions  is 
limited  to  the  conditional  support  functions 


Bel  (p  I  q)  =  Pr(p  |  Kg) .  Bel  (p  |  ig)  =  P r(p  |K->g). 


as  Pearl  evidently  does.  In  short,  not  knowing  p  is  not  the  same  as  knowing  -> p.  The  example 
of  „he  Peter,  Paul,  and  Mary  sandwich  shows  that  one  needs  to  consider  states  of  ignoraice 
that,  when  properly  accounted  for.  spoil  even  the  best  conceived  principles  of  rationality. 


To  fully  appreciate  the  complexity  of  the  problem,  suppose  that  we  change  Pearl’s  implicit 
assumptions  bringing  the  previously  absent  Peter  into  the  scene  as  a  spy  acting  on  behalf  of 
Mary  In  this  new  scenario,  still  consistent  with  Pearl's  explicit  statement  of  the  problem. 
Peter,  spying  on  Paul’s  coin  flipping  experiment,  alerts  Mary  who.  being  rather  artful  and 
deft  of  hand,  substitutes  the  sandwich  so  as  to  make  sure  that  Paul  always  loses.  In  this 
case, 


Pr(ham  j  KmRytails)  =  1  ,  Pr(turkey  j  KxARyheads)  =  1 , 


and,  most  importantly. 


Pr^(KMARyheads)  U(KKARYtails)j  =  1 , 

i.e.,  Mary  is  never  ignoranl  as  to  what  Paul  will  bet. 

The  Peter,  Paul,  and  Mary  sandwich  example  does  not,  in  our  view,  invalidate  the 
applicability  of  the  evidential  approach  but  rather  highlights  the  need  to  make  necessary 
discriminations  between  propositional  truth,  knowledge  of  that  truth,  and  the  interplay 
between  such  conditions  that  are  likely  to  be  glossed  over  by  cursory  analyses  based  on 
conventional  approaches. 

5.4  The  Disagreeing  Experts 

Another  common  misunderstanding  regarding  the  role  of  the  Dempster's  combination  for¬ 
mula  is  that  provoked  by  an  example  of  Zadeh  [24]  that,  although  originally  formulated  to 
illustrate  problems  with  its  misapplication,  is  often  described  as  an  indication  of  theoretical 
inadequacy. 

Zadeh’s  example  concerns  two  “experts"  that  assess,  in  a  rather  conflicting  fashion,  the 
likelihood  of  three,  non-compatible,  events  A.B  and  C  as  shown  in  Table  2.  Representation 
of  each  of  the  expert’s  assessment  as  a  mass  distribution  followed  by  their  combination  with 
the  Dempster’s  rule  yields  Pr (/?)  =  1.  indicating  that  the  “true"  event  is  B,  an  alternative 
considered  to  be  rather  unlikely  by  either  of  the  assessors. 


Observer 

Pr(  A ) 

Pr(B) 

Pr(C) 

1 

0.99 

0.01 

0 

2 

0 

0.01 

0.99 

Table  2:  Experts  Disagree  on  the  State  of  the  World 


Although  this  example  is  often  quoted  as  an  example  of  the  failure  of  the  Dempster’s  rule, 
it  is  clear  that  each  of  the  rows  in  Table  2  defines  a  conventional  probability  distribution,  thus 
suggesting  that  the  problem  is  likely  to  lie  elsewhere.  W;hile  one  may  be  tempted  to  defend 
any  method  of  evidence  combination  by  saying  that  the  evidence,  however  peculiar,  indicates 
that  Observer  1  is  ruling  out  alternative  C  while  Observer  2  is  excluding  alterative  A,  thus 
leaving  only  B  as  the  sole  possible  answer,  it  is  clear,  upon  further  examination,  that  the 
rows  of  Table  2  cannot  possibly  be  evaluations  of  the  same  probability  distribution.  If  that 
were  the  case,  then  at  least  one  of  the  experts  must  be  wrong,  since  there  can  only  be  one 
correct  probability  distribution,  contradicting  the  assumption  that  they  are  both  reliable. 
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Clearly,  if  the  example  is  to  make  any  sense  — under  any  type  of  probabilistic  interpretation- 
each  row  must  correspond  to  a  different  conditional  probability  where  the  conditions  corre¬ 
sponds  to  different  observations  available  to  each  expert.  A  simple  example,  suggested  by  a 
recent  example  used  by  Kyburg  [9]  to  address  other  probabilistic  reasoning  issues,  will  help 
to  clarify  matters. 

In  this  example  we  are  being  asked  to  reason,  on  the  basis  of  available  evidence,  about 
the  taste  and  edibility  of  certain  berries  that  may  be  either  small  or  large,  red  or  blue,  have 
good  or  bad  taste,  or  be  safe  or  poisonous  to  eat.  We  will  assume  that  the  berries  in  question 
are  distributed  according  to  the  distribution  shown  in  Table  3. 


Berries 

Color 

Size 

Taste/Edibility 

Probability 

Red 

Small 

Good/Edible 

99/199 

Blue 

Large 

Bad /Edible 

99/199 

Red 

Large 

Poisonous 

1/199 

Table  3:  The  Berries  Probability  Distribution 


If  now  a  berry  is  picked  up  and  found  l»v  an  expert  to  be  large,  he  will  correctly  conclude 
from  such  evidence  that 

Pr(Good|Large)  =  0.  Pr(Poisonous|Large)  =  0.01  .  Pr(Bad  Taste|Large)  =  0.99. 

Another  expert,  noticing  that  the  berry  is  red.  will  conclude,  on  the  other  hand,  that 

Pr(Good|Red)  =  0.99.  Pr(PoisonousjRed)  =  0.01 .  Pr(Bad  Taste|Large)  =  0. 

Clearly  the  evidential  implications  of  these  two  separate  observations  are  identical  to  the 
situation  summarized  in  Table  2.  Examination  of  Table  3.  however,  reveals  that 

Pr(  Poisonous  [Red.  Large)  =  1  . 

a  correct  solution  that  must  be  rationally  be  expected  from  any  reasoning  method  that 
purports  to  be  valid. 

The  solution  to  the  puzzle  of  the  disagreeing  experts  lies  on  recognizing  that  there  is, 
in  fact,  no  disparity  of  opinion  among  them.  Each  is  providing  quantitative  measures  of 
likelihood  with  respect  to  different  reference  classes.  The  Dempster  formula,  as  observed 
by  Zadeh,  should  never  be  applied  to  pool  partial  information  about  the  same  probability 
distribution.  Furthermore,  as  shown  by  a  sensitivity  analysis  of  the  results  of  its  application 
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to  the  berries  example,  its  usage  in  situations  where  there  is  considerable  disparity  between 
reference  classes  (as  suggested  by  the  large  normalization  factor)  should  be  discouraged  on 
the  basis  of  practical  rather  than  conceptual  considerations. 

6  On  Complexity  and  Generality 

The  potential  complexity  of  the  belief-function  approach  to  represent  and  manipulate  interval 
constraints  on  a  family  of  probability  distributions  has  been  often  mentioned  as  a  handicap 
of  the  evidential  reasoning  methodology.  In  spite  of  such  misgivings,  two  major  empirical 
observations  have  indicated  that  the  approach  is  applicable  to  a  wide  variety  of  practical 
problems. 

First,  our  experience  shows  that,  notwithstanding  criticisms  based  on  unrealistic  worst- 
case  scenarios,  the  approach  is  computationally  efficient.  In  particular,  we  have  found  that 
representation  of  belief  functions  in  terms  of  basic  probabilistic  assignments  results  in  a  stor¬ 
age  and  manipulation  scheme  that  is  both  economical  and  easy  to  understand.  In  addition, 
we  have  sucessfully  implemented  tools,  such  as  summarization  and  coarsening  operators, 
which  may  be  effectively  utilized  to  limit  representational  complexity. 

Second,  our  current  functional  operators  have  been  chosen  to  guarantee  that  probabilistic 
information  will  always  be  capable  of  being  represented  within  the  scope  of  the  approach, 
as  more  general  constraints  do  not  either  enter  into  consideration  or  appear  as  the  result  of 
any  of  its  functions. 

The  lack  of  generality  of  the  belief-function  approach  to  represent  general  lower-upper 
probability  constraints  is  well  known  [10].  Our  reliance  on  the  methodology  is  primarily 
the  result  of  practical  considerations:  while  we  would  prefer  to  manipulate  more  general 
constraints  on  probability  values,  compelling  computational  efficiency  arguments  force  us  to 
limit  the  scope  of  the  problems  considered  to  those  capable  of  being  at  least  approximately 
solved  by  a  belief-function  treatment. 

Being,  in  general,  partial  towards  interpretations  of  evidential  structures  that  are  fully 
compatible  with  probability  theory,  our  current  research  is  being  directed  toward  the  devel¬ 
opment  of  more  general,  yet  efficient,  representation  and  manipulation  methods. 

Our  current  concerns  with  the  manipulation  of  conditional  and  dependent  evidence  (i.e.. 
the  evidential  counterpart  of  conditional  probabilities)  show,  for  example,  that,  for  some 
important  problems,  the  results  of  evidential  combination  fall  outside  the  scope  of  its  repre¬ 
sentational  capabilities.  In  our  experience,  these  methodological  limitations  are  more  worri¬ 
some  than  any  of  the  supposedly  paradoxical  results  arising  from  its  misuse  or  its  claimed 
lack  of  a  decision-making  apparatus. 

Preliminary  results [1G]  indicate,  on  the  other  hand,  that  the  belief-function  approach 
may  be  used  to  approximate  the  results  of  these  evidential  combination  operations  and 
that  extended  representation  mechanisms [20]  may  yet  be  developed  to  treat  more  general 
evidential  problems.  This  research  also  shows  the  basic  errors  inherent  in  criticisms  that 
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regard  the  belief-function  approach  as  a  fully  developed  methodology  incapable  of  sustaining 
further  enhancement  and  modification.  Having  been  studied  in  depth  for  only  fifteen  years, 
its  technological  status  is  that  of  a  young  discipline  being  both  capable  of  enhancement  on  its 
own  and  of  combination  with  other  approaches  to  produce  more  general  tools  for  probabilistic 
reasoning.  Far  from  p.  Dving  that  we  have  reached  a  technological  plateau,  our  investigations 
indicate  that  much  is  yet  to  be  gained  from  such  a  development  and  integration  process. 
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